Mako hSDM BRT explore (Background PAs)

Author

Emily Nazario

Published

July 15, 2024

On this document, I’ve included the results from the initial exploration into the different model outputs, ranking of covariate influence, performance metrics, and prediction maps. This first set of models only includes extracted covariate data at a daily temporal resolution, but I am also considering exploring models that include covariate data at a seasonal or annual temporal resolution. The pseudo absences used in these models were generated using background sampling approaches. Lastly, hyperparameters were tuned using the caret package and across all models, a learning rate of 0.05 and tree complexity of 3 resulted in the highest accuracy. Lastly, the ‘pred_var’ predictor is a random set of numbers that will be used to identify which predictor variables should be included in the final model, and which are not informative.

The hypotheses I would like to test with these models are as follows:

H1: The AGI model will perform better than the dissolved oxygen and null model, and the dissolved oxygen model will perform better than the null model.

study objective being met: Which model performs the best and presents the best predictions (i.e., best predictive performance scores, most ecologically realistic suitability maps)?

H2: The inclusion of dissolved oxygen at deeper depths will result in better/more ecologically realistic habitat suitability predictions relative to the dissolved oxygen model considering surface values alone.

study objective being met: How does dissolved oxygen at different depths influence habitat suitability predictions relative to oxygen at the surface?

H3: The inclusion of the AGI at deeper depths will result in better/more ecologically realistic habitat suitability predictions relative to the AGI model considering surface values alone.

study objective being met: How does the aerobic growth index (AGI; environmental oxygen supply:theoretical oxygen demand) at different depths influence habitat suitability predictions relative to the aerobic growth index at the surface?

H4: There will be important relationships between dissolved oxygen/the AGI and latitude/distance to coast.

study objective being met: Are there any important relationships between dissolved oxygen or AGI at the surface or at depth and latitude or distance to the coast?

H5: The null model will predict higher habitat suitability in areas or during seasons or periods (upwelling or La Niña) with lower dissolved oxygen through the water column relative to the dissolved oxygen and AGI models.

study objective being met: How do the habitat suitability maps differ between the models? How do these variations compare for different points in time?

Base models

These three models represent three different options for the base model and either include spatial predictors, a tag ID predictor, both, or neither. These models were developed by splitting the data set into 75/25 train/test, and thus that is the model evaluation approach used here. However, once a model is selected, I can run additional evaluation metrics (i.e., LOO, k-fold). I can also complete these now depending on when that is typically performed.

base_Nspat_Ntag <- readRDS(brt_outputs[7])

# model performance 
ggBRT::ggPerformance(base_Nspat_Ntag)
                     Model 1
Total.Deviance     1.3862741
Residual.Deviance  0.2861955
Correlation        0.9282161
AUC                0.9929000
Per.Expl          79.3550598
cvDeviance         0.5921329
cvCorrelation      0.8021263
cvAUC              0.9463200
cvPer.Expl        57.2860162
#relative influence of predictors
ggBRT::ggInfluence(base_Nspat_Ntag) 

             rel.inf
bathy_mean 37.846057
temp_mean  24.062148
sal_mean    7.155808
chl_mean    6.204518
ssh_mean    5.944164
uostr_mean  4.190075
vostr_mean  3.770208
bathy_sd    2.868287
mld_mean    2.643751
uo_mean     2.213286
vo_mean     1.880648
pred_var    1.221051
#explore partial plots
gbm.plot(base_Nspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
base_int <- gbm.interactions(base_Nspat_Ntag)
base_int$rank.list
  var1.index var1.names var2.index var2.names int.size
1         10 bathy_mean          2  temp_mean   854.94
2         10 bathy_mean          3   sal_mean   698.33
3         10 bathy_mean          8   ssh_mean   563.67
4          8   ssh_mean          2  temp_mean   551.11
5         10 bathy_mean          4    uo_mean   540.01
6          3   sal_mean          2  temp_mean   399.47
7          2  temp_mean          1   chl_mean   346.97
#predictive performance using test dataset 
preds <- predict.gbm(base_Nspat_Ntag, dat_test_base,
                     n.trees = base_Nspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_base$PA, preds) #get % deviance
[1] 0.3752151
dat_pred_base <- cbind(dat_test_base$PA, preds)
pres_base <- dat_pred_base[dat_pred_base[,1] == 1, 2]
abs_base <- dat_pred_base[dat_pred_base[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_base, a = abs_base)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7396679
max(e@TPR + e@TNR -1) #TSS
[1] 0.8720777
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(base_Nspat_Ntag)
[1] 79.35506
#eval 75/25
eval_7525_modified(base_Nspat_Ntag, 
                 testInput = dat_test_base, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4450 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2282071 0.8916574 0.9800992 0.9987515         0.7293395 0.7935506
base_Nspat_Ytag <- readRDS(brt_outputs[8])

# model performance 
ggBRT::ggPerformance(base_Nspat_Ytag)
                     Model 1
Total.Deviance     1.3862741
Residual.Deviance  0.1110165
Correlation        0.9807084
AUC                0.9998000
Per.Expl          91.9917378
cvDeviance         0.3495621
cvCorrelation      0.8963288
cvAUC              0.9783500
cvPer.Expl        74.7840586
#relative influence of predictors
ggBRT::ggInfluence(base_Nspat_Ytag) 

              rel.inf
bathy_mean 32.1041550
tag        24.5344527
temp_mean  18.3478756
uostr_mean  5.0132259
ssh_mean    4.4470038
sal_mean    4.3156858
chl_mean    3.5816288
vostr_mean  2.8189697
bathy_sd    1.2908042
vo_mean     1.2581968
uo_mean     0.9661739
mld_mean    0.8414705
pred_var    0.4803574
#explore partial plots
gbm.plot(base_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
base_int <- gbm.interactions(base_Nspat_Ytag)
base_int$rank.list
  var1.index var1.names var2.index var2.names int.size
1          4   sal_mean          1        tag  1956.77
2         11 bathy_mean          1        tag   827.44
3          3  temp_mean          1        tag   701.21
4          2   chl_mean          1        tag   665.02
5          9   ssh_mean          1        tag   579.04
6          7    vo_mean          1        tag   394.71
7         11 bathy_mean          3  temp_mean   305.91
8          6 uostr_mean          1        tag   288.92
#predictive performance using test dataset 
preds <- predict.gbm(base_Nspat_Ytag, dat_test_base,
                     n.trees = base_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_base$PA, preds) #get % deviance
[1] 0.1818552
dat_pred_base <- cbind(dat_test_base$PA, preds)
pres_base <- dat_pred_base[dat_pred_base[,1] == 1, 2]
abs_base <- dat_pred_base[dat_pred_base[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_base, a = abs_base)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7593379
max(e@TPR + e@TNR -1) #TSS
[1] 0.9572719
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(base_Nspat_Ytag)
[1] 91.99174
#eval 75/25
eval_7525_modified(base_Nspat_Ytag, 
                 testInput = dat_test_base, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5800 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
       RMSE      Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1469908 0.956733 0.9939051 0.9935086         0.8688192 0.9199174
base_Yspat_Ytag <- readRDS(brt_outputs[9])

# model performance 
ggBRT::ggPerformance(base_Yspat_Ytag)
                      Model 1
Total.Deviance     1.38627408
Residual.Deviance  0.09036597
Correlation        0.98482457
AUC                0.99990000
Per.Expl          93.48137767
cvDeviance         0.29733241
cvCorrelation      0.91408473
cvAUC              0.98300000
cvPer.Expl        78.55168620
#relative influence of predictors
ggBRT::ggInfluence(base_Yspat_Ytag) 

              rel.inf
dist_coast 52.3832103
tag        20.2478553
lat         9.1003879
temp_mean   4.2416317
bathy_mean  3.5147582
chl_mean    2.7750598
sal_mean    2.0626823
ssh_mean    1.2159874
vostr_mean  1.1867095
uo_mean     0.6788234
vo_mean     0.6673415
uostr_mean  0.5772978
bathy_sd    0.5066506
mld_mean    0.5003317
pred_var    0.3412726
#explore partial plots
gbm.plot(base_Yspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
base_int <- gbm.interactions(base_Yspat_Ytag)
base_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1           2        lat          1        tag   651.04
2           3   chl_mean          1        tag   595.00
3          12 bathy_mean          1        tag   544.74
4           5   sal_mean          1        tag   499.41
5          14 dist_coast          1        tag   381.15
6           9 vostr_mean          1        tag   319.69
7           8    vo_mean          1        tag   285.70
8           4  temp_mean          1        tag   249.85
9          10   ssh_mean          1        tag   193.53
10         13   bathy_sd          1        tag   163.65
11          6    uo_mean          1        tag   123.96
#predictive performance using test dataset 
preds <- predict.gbm(base_Yspat_Ytag, dat_test_base,
                     n.trees = base_Yspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_base$PA, preds) #get % deviance
[1] 0.1536541
dat_pred_base <- cbind(dat_test_base$PA, preds)
pres_base <- dat_pred_base[dat_pred_base[,1] == 1, 2]
abs_base <- dat_pred_base[dat_pred_base[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_base, a = abs_base)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7709545
max(e@TPR + e@TNR -1) #TSS
[1] 0.965249
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(base_Yspat_Ytag)
[1] 93.48138
#eval 75/25
eval_7525_modified(base_Yspat_Ytag, 
                 testInput = dat_test_base, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5300 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1339725 0.9641455 0.9954561 0.9899414          0.889162 0.9348138

DO models

I ran a suite of models that include various combinations of data at depth, spatial predictors, and tag ID predictors. Moving forward, I would also like to include DO and the other environmental predictor variables as longer time scales (seasonal/annual).

do_0m_Nspat_Ytag <- readRDS(brt_outputs[14])

# model performance 
ggBRT::ggPerformance(do_0m_Nspat_Ytag)
                      Model 1
Total.Deviance     1.38629302
Residual.Deviance  0.08569242
Correlation        0.98673362
AUC                1.00000000
Per.Expl          93.81859271
cvDeviance         0.30444849
cvCorrelation      0.91329019
cvAUC              0.98270000
cvPer.Expl        78.03866260
#relative influence of predictors
ggBRT::ggInfluence(do_0m_Nspat_Ytag) 

              rel.inf
bathy_mean 31.3188924
o2_mean_0m 25.6922130
tag        20.8632526
temp_mean   4.9825010
ssh_mean    3.9449855
chl_mean    3.6738368
vostr_mean  2.1762902
uostr_mean  1.9424041
sal_mean    1.8962177
bathy_sd    0.9210810
mld_mean    0.8088337
uo_mean     0.7033574
vo_mean     0.6742398
pred_var    0.4018948
#explore partial plots
gbm.plot(do_0m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_Nspat_Ytag)
do_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1           5   sal_mean          1        tag  1317.79
2          12 bathy_mean          1        tag   898.27
3           2 o2_mean_0m          1        tag   753.94
4           4  temp_mean          2 o2_mean_0m   731.95
5          13   bathy_sd          1        tag   461.04
6           4  temp_mean          1        tag   452.00
7           3   chl_mean          1        tag   350.97
8          10   ssh_mean          1        tag   322.16
9           9 vostr_mean          1        tag   308.68
10          8    vo_mean          1        tag   280.17
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_Nspat_Ytag, dat_test_do,
                     n.trees = do_0m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.1311332
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7691736
max(e@TPR + e@TNR -1) #TSS
[1] 0.9755573
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_Nspat_Ytag)
[1] 93.81859
#eval 75/25
eval_7525_modified(do_0m_Nspat_Ytag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5850 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1200159 0.9715853 0.9970746  0.998006         0.9054067 0.9381859
do_0m_Yspat_Ytag <- readRDS(brt_outputs[15])

# model performance 
ggBRT::ggPerformance(do_0m_Yspat_Ytag)
                      Model 1
Total.Deviance     1.38629302
Residual.Deviance  0.07111301
Correlation        0.98956599
AUC                1.00000000
Per.Expl          94.87027583
cvDeviance         0.26513270
cvCorrelation      0.92577703
cvAUC              0.98583000
cvPer.Expl        80.87470024
#relative influence of predictors
ggBRT::ggInfluence(do_0m_Yspat_Ytag) 

              rel.inf
dist_coast 52.2693314
tag        18.2789790
o2_mean_0m  9.9461704
lat         7.1842134
bathy_mean  3.0959517
chl_mean    1.9510753
sal_mean    1.4925425
temp_mean   1.2416253
vostr_mean  0.9383412
ssh_mean    0.8462597
vo_mean     0.5322002
bathy_sd    0.5206533
uo_mean     0.5128695
mld_mean    0.4554318
uostr_mean  0.4344147
pred_var    0.2999407
#explore partial plots
gbm.plot(do_0m_Yspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_Yspat_Ytag)
do_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1           3 o2_mean_0m          1        tag   681.29
2           2        lat          1        tag   614.15
3          15 dist_coast          1        tag   509.20
4          13 bathy_mean          1        tag   470.74
5           5  temp_mean          3 o2_mean_0m   461.63
6           4   chl_mean          1        tag   409.48
7           6   sal_mean          1        tag   393.97
8          14   bathy_sd          1        tag   350.26
9          10 vostr_mean          1        tag   252.37
10          9    vo_mean          1        tag   250.86
11         15 dist_coast          3 o2_mean_0m   214.24
12         11   ssh_mean          1        tag   202.54
13          5  temp_mean          1        tag   189.74
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_Yspat_Ytag, dat_test_do,
                     n.trees = do_0m_Yspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.1138218
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7808104
max(e@TPR + e@TNR -1) #TSS
[1] 0.977085
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_Yspat_Ytag)
[1] 94.87028
#eval 75/25
eval_7525_modified(do_0m_Yspat_Ytag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5500 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1124583 0.9749439 0.9977097 0.9977912         0.9178943 0.9487028
do_0m_60m_Nspat_Ytag <- readRDS(brt_outputs[13])

# model performance 
ggBRT::ggPerformance(do_0m_60m_Nspat_Ytag)
                      Model 1
Total.Deviance     1.38629302
Residual.Deviance  0.06921534
Correlation        0.99039620
AUC                1.00000000
Per.Expl          95.00716346
cvDeviance         0.28881667
cvCorrelation      0.91957663
cvAUC              0.98372000
cvPer.Expl        79.16626123
#relative influence of predictors
ggBRT::ggInfluence(do_0m_60m_Nspat_Ytag) 

               rel.inf
bathy_mean  28.6020233
o2_mean_0m  25.7568149
tag         19.2681616
o2_mean_60m 10.5230432
ssh_mean     3.9366672
chl_mean     3.0096521
temp_mean    1.8833026
sal_mean     1.7203767
uostr_mean   1.2905618
vostr_mean   1.2578356
mld_mean     0.6504126
bathy_sd     0.6501386
vo_mean      0.6103547
uo_mean      0.5075413
pred_var     0.3331139
#explore partial plots
gbm.plot(do_0m_60m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_60m_Nspat_Ytag)
do_int$rank.list
   var1.index  var1.names var2.index var2.names int.size
1          12  bathy_mean          1        tag   827.96
2           5    sal_mean          1        tag   765.02
3           2  o2_mean_0m          1        tag   726.73
4          14 o2_mean_60m          1        tag   465.01
5           4   temp_mean          2 o2_mean_0m   448.02
6           3    chl_mean          1        tag   401.11
7           4   temp_mean          1        tag   368.01
8           9  vostr_mean          1        tag   330.29
9          13    bathy_sd          1        tag   285.54
10         10    ssh_mean          1        tag   258.37
11          8     vo_mean          1        tag   217.71
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_60m_Nspat_Ytag, dat_test_do,
                     n.trees = do_0m_60m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.1180656
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7723101
max(e@TPR + e@TNR -1) #TSS
[1] 0.9775898
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_60m_Nspat_Ytag)
[1] 95.00716
#eval 75/25
eval_7525_modified(do_0m_60m_Nspat_Ytag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6150 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1131281 0.9746304 0.9972402 0.9972539         0.9148331 0.9500716
do_0m_250m_Nspat_Ytag <- readRDS(brt_outputs[10])

# model performance 
ggBRT::ggPerformance(do_0m_250m_Nspat_Ytag)
                     Model 1
Total.Deviance     1.3862930
Residual.Deviance  0.0772215
Correlation        0.9880214
AUC                1.0000000
Per.Expl          94.4296408
cvDeviance         0.2923215
cvCorrelation      0.9180102
cvAUC              0.9832900
cvPer.Expl        78.9134402
#relative influence of predictors
ggBRT::ggInfluence(do_0m_250m_Nspat_Ytag) 

                rel.inf
o2_mean_0m   26.7124835
bathy_mean   24.5325773
tag          19.6354077
o2_mean_250m 16.3256290
ssh_mean      2.3877300
chl_mean      2.0737228
temp_mean     1.9080636
sal_mean      1.5922732
vostr_mean    1.0597981
bathy_sd      0.9220362
uostr_mean    0.8579829
vo_mean       0.6550567
uo_mean       0.5147781
mld_mean      0.5111500
pred_var      0.3113108
#explore partial plots
gbm.plot(do_0m_250m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_250m_Nspat_Ytag)
do_int$rank.list
   var1.index   var1.names var2.index var2.names int.size
1           5     sal_mean          1        tag  1083.83
2           2   o2_mean_0m          1        tag   810.86
3           4    temp_mean          2 o2_mean_0m   611.00
4          14 o2_mean_250m          1        tag   533.46
5          12   bathy_mean          1        tag   465.44
6           4    temp_mean          1        tag   407.57
7           3     chl_mean          1        tag   361.70
8          13     bathy_sd          1        tag   323.33
9          10     ssh_mean          1        tag   307.93
10          9   vostr_mean          1        tag   257.21
11         14 o2_mean_250m          2 o2_mean_0m   241.29
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_250m_Nspat_Ytag, dat_test_do,
                     n.trees = do_0m_250m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.1232636
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7771163
max(e@TPR + e@TNR -1) #TSS
[1] 0.9760588
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_250m_Nspat_Ytag)
[1] 94.42964
#eval 75/25
eval_7525_modified(do_0m_250m_Nspat_Ytag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5650 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1166426 0.9730488 0.9972335 0.9974503         0.9110834 0.9442964
do_0m_60m_250m_Nspat_Ytag <- readRDS(brt_outputs[11])

# model performance 
ggBRT::ggPerformance(do_0m_60m_250m_Nspat_Ytag)
                      Model 1
Total.Deviance     1.38629302
Residual.Deviance  0.06405767
Correlation        0.99133355
AUC                1.00000000
Per.Expl          95.37921112
cvDeviance         0.27651361
cvCorrelation      0.92323053
cvAUC              0.98486000
cvPer.Expl        80.05373977
#relative influence of predictors
ggBRT::ggInfluence(do_0m_60m_250m_Nspat_Ytag) 

                rel.inf
o2_mean_0m   26.3744059
bathy_mean   24.2567146
tag          18.4164082
o2_mean_250m 12.9309717
o2_mean_60m   5.8644578
ssh_mean      3.4303989
chl_mean      1.8947511
temp_mean     1.3576031
sal_mean      1.2825109
uostr_mean    0.8488525
vostr_mean    0.8461226
bathy_sd      0.6934417
vo_mean       0.5279284
mld_mean      0.5255140
uo_mean       0.4617161
pred_var      0.2882026
#explore partial plots
gbm.plot(do_0m_60m_250m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_60m_250m_Nspat_Ytag)
do_int$rank.list
   var1.index   var1.names var2.index var2.names int.size
1           2   o2_mean_0m          1        tag   756.09
2           5     sal_mean          1        tag   744.60
3           4    temp_mean          2 o2_mean_0m   681.68
4          12   bathy_mean          1        tag   457.99
5          15 o2_mean_250m          1        tag   423.21
6          14  o2_mean_60m          1        tag   351.76
7           3     chl_mean          1        tag   331.31
8           4    temp_mean          1        tag   322.01
9          13     bathy_sd          1        tag   241.45
10          9   vostr_mean          1        tag   237.92
11          8      vo_mean          1        tag   230.84
12         10     ssh_mean          1        tag   227.37
13          7   uostr_mean          1        tag   207.53
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_60m_250m_Nspat_Ytag, dat_test_do,
                     n.trees = do_0m_60m_250m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.1131788
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7783925
max(e@TPR + e@TNR -1) #TSS
[1] 0.9786226
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_60m_250m_Nspat_Ytag)
[1] 95.37921
#eval 75/25
eval_7525_modified(do_0m_60m_250m_Nspat_Ytag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6000 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1111881 0.9754482 0.9973543 0.9967165         0.9183581 0.9537921
do_0m_60m_250m_Yspat_Ytag <- readRDS(brt_outputs[12])

# model performance 
ggBRT::ggPerformance(do_0m_60m_250m_Yspat_Ytag)
                      Model 1
Total.Deviance     1.38629302
Residual.Deviance  0.06107201
Correlation        0.99179288
AUC                1.00000000
Per.Expl          95.59458127
cvDeviance         0.26289954
cvCorrelation      0.92771979
cvAUC              0.98575000
cvPer.Expl        81.03578841
#relative influence of predictors
ggBRT::ggInfluence(do_0m_60m_250m_Yspat_Ytag) 

                rel.inf
dist_coast   50.7776327
tag          17.4708084
o2_mean_0m   10.2952610
o2_mean_250m  4.9246556
lat           4.4525242
o2_mean_60m   2.8631545
chl_mean      1.8981111
bathy_mean    1.7092848
sal_mean      1.0541439
temp_mean     0.9245410
vostr_mean    0.6430521
ssh_mean      0.6087970
vo_mean       0.4919662
uostr_mean    0.4554889
bathy_sd      0.4404908
mld_mean      0.4397390
uo_mean       0.3089437
pred_var      0.2414052
#explore partial plots
gbm.plot(do_0m_60m_250m_Yspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_60m_250m_Yspat_Ytag)
do_int$rank.list
   var1.index   var1.names var2.index var2.names int.size
1           3   o2_mean_0m          1        tag   639.99
2           6     sal_mean          1        tag   381.15
3          14     bathy_sd          1        tag   369.34
4          15   dist_coast          1        tag   365.12
5           2          lat          1        tag   346.60
6          17 o2_mean_250m          1        tag   306.59
7           4     chl_mean          1        tag   290.18
8          13   bathy_mean          1        tag   285.08
9          16  o2_mean_60m          5  temp_mean   253.71
10         16  o2_mean_60m          1        tag   244.79
11          5    temp_mean          3 o2_mean_0m   222.92
12         10   vostr_mean          1        tag   215.04
13          9      vo_mean          1        tag   167.52
14          8   uostr_mean          1        tag   116.74
15         15   dist_coast          3 o2_mean_0m   109.89
16         11     ssh_mean          1        tag   109.82
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_60m_250m_Yspat_Ytag, dat_test_do,
                     n.trees = do_0m_60m_250m_Yspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.1026398
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7833562
max(e@TPR + e@TNR -1) #TSS
[1] 0.978366
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_60m_250m_Yspat_Ytag)
[1] 95.59458
#eval 75/25
eval_7525_modified(do_0m_60m_250m_Yspat_Ytag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5700 iterations were performed.
There were 18 predictors of which 18 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1067755 0.9773981 0.9982434 0.9959618         0.9259605 0.9559458

AGI models

I ran a suite of models that include various combinations of data at depth, spatial predictors, and tag ID predictors. Moving forward, I would also like to include AGI and the other environmental predictor variables as longer time scales (seasonal/annual).

agi_0m_Nspat_Ytag <- readRDS(brt_outputs[5])

# model performance 
ggBRT::ggPerformance(agi_0m_Nspat_Ytag)
                      Model 1
Total.Deviance     1.38628919
Residual.Deviance  0.09058218
Correlation        0.98574976
AUC                1.00000000
Per.Expl          93.46585281
cvDeviance         0.31679477
cvCorrelation      0.91027275
cvAUC              0.98076000
cvPer.Expl        77.14800254
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_Nspat_Ytag) 

              rel.inf
bathy_mean 31.7564227
tag        22.4317739
temp_mean  19.1472195
uostr_mean  4.6441372
ssh_mean    4.6035826
AGI_0m      4.3344168
sal_mean    3.7523624
chl_mean    2.9185772
vostr_mean  2.4429168
bathy_sd    1.2895177
vo_mean     0.8922124
mld_mean    0.7121506
uo_mean     0.6928352
pred_var    0.3818752
#explore partial plots
gbm.plot(agi_0m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_Nspat_Ytag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1          13     AGI_0m          3  temp_mean  1670.77
2           4   sal_mean          1        tag  1358.50
3           3  temp_mean          1        tag   796.37
4          11 bathy_mean          1        tag   716.67
5          12   bathy_sd          1        tag   500.56
6           8 vostr_mean          1        tag   360.82
7           2   chl_mean          1        tag   350.27
8          11 bathy_mean          3  temp_mean   343.69
9           9   ssh_mean          1        tag   293.85
10          7    vo_mean          1        tag   283.73
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_Nspat_Ytag, dat_test_agi,
                     n.trees = agi_0m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.1426735
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7635415
max(e@TPR + e@TNR -1) #TSS
[1] 0.9696389
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_Nspat_Ytag)
[1] 93.46585
#eval 75/25
eval_7525_modified(agi_0m_Nspat_Ytag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5900 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1269187 0.9680701 0.9963482 0.9991957         0.8970815 0.9346585
agi_0m_Yspat_Ytag <- readRDS(brt_outputs[6])

# model performance 
ggBRT::ggPerformance(agi_0m_Yspat_Ytag)
                      Model 1
Total.Deviance     1.38628919
Residual.Deviance  0.07536563
Correlation        0.98861806
AUC                1.00000000
Per.Expl          94.56349886
cvDeviance         0.28457545
cvCorrelation      0.92193385
cvAUC              0.98308000
cvPer.Expl        79.47214380
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_Yspat_Ytag) 

              rel.inf
dist_coast 52.6885132
tag        19.0100569
lat         9.3755080
temp_mean   3.8940828
bathy_mean  3.1187408
AGI_0m      2.5281708
chl_mean    2.1679033
sal_mean    2.0794485
ssh_mean    1.1326443
vostr_mean  0.8369160
uostr_mean  0.6860411
bathy_sd    0.6046589
vo_mean     0.5672036
uo_mean     0.5434052
mld_mean    0.4635287
pred_var    0.3031779
#explore partial plots
gbm.plot(agi_0m_Yspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_Yspat_Ytag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1           2        lat          1        tag   618.71
2           5   sal_mean          1        tag   458.94
3           3   chl_mean          1        tag   457.19
4          12 bathy_mean          1        tag   444.88
5          15 dist_coast          1        tag   353.77
6           4  temp_mean          1        tag   314.80
7          13   bathy_sd          1        tag   307.47
8           8    vo_mean          1        tag   282.40
9           9 vostr_mean          1        tag   277.46
10         14     AGI_0m          4  temp_mean   224.72
11         14     AGI_0m          1        tag   223.58
12         10   ssh_mean          1        tag   155.83
13          7 uostr_mean          1        tag   125.51
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_Yspat_Ytag, dat_test_agi,
                     n.trees = agi_0m_Yspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.1235775
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7758431
max(e@TPR + e@TNR -1) #TSS
[1] 0.9758245
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_Yspat_Ytag)
[1] 94.5635
#eval 75/25
eval_7525_modified(agi_0m_Yspat_Ytag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5550 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
       RMSE       Cor  C-index PredRatio DevianceExplained PseudoR2
1 0.1172909 0.9726691 0.996907 0.9962742         0.9108565 0.945635
agi_0m_60m_Nspat_Ytag <- readRDS(brt_outputs[4])

# model performance 
ggBRT::ggPerformance(agi_0m_60m_Nspat_Ytag)
                      Model 1
Total.Deviance     1.38628919
Residual.Deviance  0.08163925
Correlation        0.98784986
AUC                1.00000000
Per.Expl          94.11095093
cvDeviance         0.30657340
cvCorrelation      0.91435083
cvAUC              0.98164000
cvPer.Expl        77.88532168
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_60m_Nspat_Ytag) 

              rel.inf
bathy_mean 31.0165218
tag        22.7089581
temp_mean  19.3123505
uostr_mean  4.0921717
AGI_0m      4.0228983
AGI_60m     3.9028901
sal_mean    3.7019688
ssh_mean    3.0607407
vostr_mean  2.4285168
chl_mean    2.2359268
bathy_sd    1.0013958
vo_mean     0.8073497
uo_mean     0.7416074
mld_mean    0.6546704
pred_var    0.3120329
#explore partial plots
gbm.plot(agi_0m_60m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_60m_Nspat_Ytag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1          13     AGI_0m          3  temp_mean  1546.61
2           4   sal_mean          1        tag  1160.06
3           3  temp_mean          1        tag   718.56
4          11 bathy_mean          1        tag   678.18
5          14    AGI_60m          1        tag   562.58
6          11 bathy_mean          3  temp_mean   375.86
7           2   chl_mean          1        tag   371.78
8           8 vostr_mean          1        tag   332.14
9          12   bathy_sd          1        tag   314.26
10          7    vo_mean          1        tag   270.59
11          9   ssh_mean          1        tag   248.80
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_60m_Nspat_Ytag, dat_test_agi,
                     n.trees = agi_0m_60m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.1313386
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7637301
max(e@TPR + e@TNR -1) #TSS
[1] 0.9707055
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_60m_Nspat_Ytag)
[1] 94.11095
#eval 75/25
eval_7525_modified(agi_0m_60m_Nspat_Ytag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5950 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1207813 0.9711081 0.9967905   0.99765          0.905258 0.9411095
agi_0m_250m_Nspat_Ytag <- readRDS(brt_outputs[1])

# model performance 
ggBRT::ggPerformance(agi_0m_250m_Nspat_Ytag)
                      Model 1
Total.Deviance     1.38628919
Residual.Deviance  0.08329738
Correlation        0.98684251
AUC                1.00000000
Per.Expl          93.99134165
cvDeviance         0.29978280
cvCorrelation      0.91514338
cvAUC              0.98237000
cvPer.Expl        78.37516119
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_250m_Nspat_Ytag) 

              rel.inf
bathy_mean 25.7556370
tag        21.5992835
temp_mean  17.8343206
AGI_250m   12.5454181
uostr_mean  4.7132797
ssh_mean    4.0214353
AGI_0m      3.6118354
sal_mean    3.1006797
chl_mean    1.9387158
vostr_mean  1.4830631
bathy_sd    1.1646793
vo_mean     0.7268532
mld_mean    0.5851221
uo_mean     0.5700321
pred_var    0.3496450
#explore partial plots
gbm.plot(agi_0m_250m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_250m_Nspat_Ytag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1           4   sal_mean          1        tag  1107.60
2          13     AGI_0m          3  temp_mean   959.81
3           3  temp_mean          1        tag   653.96
4          14   AGI_250m          1        tag   483.70
5           2   chl_mean          1        tag   436.22
6          12   bathy_sd          1        tag   411.43
7          11 bathy_mean          1        tag   378.45
8           7    vo_mean          1        tag   342.67
9           8 vostr_mean          1        tag   323.33
10          9   ssh_mean          1        tag   260.62
11          6 uostr_mean          1        tag   213.62
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_250m_Nspat_Ytag, dat_test_agi,
                     n.trees = agi_0m_250m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.1374469
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7693346
max(e@TPR + e@TNR -1) #TSS
[1] 0.9704405
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_250m_Nspat_Ytag)
[1] 93.99134
#eval 75/25
eval_7525_modified(agi_0m_250m_Nspat_Ytag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5650 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
      RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.123622 0.9696011 0.9959497 0.9972633         0.9008517 0.9399134
agi_0m_60m_250m_Nspat_Ytag <- readRDS(brt_outputs[2])

# model performance 
ggBRT::ggPerformance(agi_0m_60m_250m_Nspat_Ytag)
                      Model 1
Total.Deviance     1.38628919
Residual.Deviance  0.07572957
Correlation        0.98877144
AUC                1.00000000
Per.Expl          94.53724618
cvDeviance         0.29301289
cvCorrelation      0.91783184
cvAUC              0.98281000
cvPer.Expl        78.86350878
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_60m_250m_Nspat_Ytag) 

              rel.inf
bathy_mean 26.8363780
tag        21.0957906
temp_mean  18.0262218
AGI_250m   12.1947644
uostr_mean  4.4840074
AGI_0m      3.8063516
ssh_mean    3.0616953
sal_mean    2.7962168
chl_mean    1.7130823
AGI_60m     1.4809786
vostr_mean  1.2200984
bathy_sd    1.0170479
vo_mean     0.8630534
uo_mean     0.5773383
mld_mean    0.5453203
pred_var    0.2816549
#explore partial plots
gbm.plot(agi_0m_60m_250m_Nspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_60m_250m_Nspat_Ytag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1          13     AGI_0m          3  temp_mean  1152.68
2           4   sal_mean          1        tag  1005.00
3           3  temp_mean          1        tag   635.71
4          15   AGI_250m          1        tag   397.25
5           2   chl_mean          1        tag   396.66
6          14    AGI_60m          1        tag   391.13
7           7    vo_mean          1        tag   371.09
8          11 bathy_mean          1        tag   357.41
9          12   bathy_sd          1        tag   320.70
10          8 vostr_mean          1        tag   299.57
11          9   ssh_mean          1        tag   232.02
12         15   AGI_250m          3  temp_mean   161.28
13          6 uostr_mean          1        tag   142.65
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_60m_250m_Nspat_Ytag, dat_test_agi,
                     n.trees = agi_0m_60m_250m_Nspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.1286697
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.768278
max(e@TPR + e@TNR -1) #TSS
[1] 0.9727492
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_60m_250m_Nspat_Ytag)
[1] 94.53725
#eval 75/25
eval_7525_modified(agi_0m_60m_250m_Nspat_Ytag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5700 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
     RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.11824 0.9722116 0.9961794 0.9980251         0.9071832 0.9453725
agi_0m_60m_250m_Yspat_Ytag <- readRDS(brt_outputs[3])

# model performance 
ggBRT::ggPerformance(agi_0m_60m_250m_Yspat_Ytag)
                      Model 1
Total.Deviance     1.38628919
Residual.Deviance  0.07231845
Correlation        0.98904888
AUC                1.00000000
Per.Expl          94.78330693
cvDeviance         0.27235546
cvCorrelation      0.92438633
cvAUC              0.98474000
cvPer.Expl        80.35363316
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_60m_250m_Yspat_Ytag) 

              rel.inf
dist_coast 52.2028237
tag        19.8286802
lat         8.5259754
temp_mean   3.6800178
AGI_250m    3.2802055
AGI_0m      2.2035478
bathy_mean  1.8557018
chl_mean    1.7402778
sal_mean    1.4654785
AGI_60m     1.1294626
ssh_mean    0.7267973
vostr_mean  0.7162280
uostr_mean  0.5488083
bathy_sd    0.5294415
vo_mean     0.5050275
mld_mean    0.4284863
uo_mean     0.3811785
pred_var    0.2518614
#explore partial plots
gbm.plot(agi_0m_60m_250m_Yspat_Ytag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_60m_250m_Yspat_Ytag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1           2        lat          1        tag   455.96
2          16    AGI_60m          1        tag   450.35
3           3   chl_mean          1        tag   425.15
4          14     AGI_0m          4  temp_mean   409.72
5          13   bathy_sd          1        tag   384.02
6           5   sal_mean          1        tag   331.25
7          12 bathy_mean          1        tag   330.47
8          17   AGI_250m          1        tag   285.00
9           9 vostr_mean          1        tag   237.90
10          4  temp_mean          1        tag   237.18
11          8    vo_mean          1        tag   234.93
12         14     AGI_0m          1        tag   169.22
13         15 dist_coast          1        tag   157.56
14          7 uostr_mean          1        tag   127.32
15         16    AGI_60m          3   chl_mean    99.12
16         14     AGI_0m          2        lat    88.90
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_60m_250m_Yspat_Ytag, dat_test_agi,
                     n.trees = agi_0m_60m_250m_Yspat_Ytag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.1221806
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7732709
max(e@TPR + e@TNR -1) #TSS
[1] 0.9742956
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_60m_250m_Yspat_Ytag)
[1] 94.78331
#eval 75/25
eval_7525_modified(agi_0m_60m_250m_Yspat_Ytag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5200 iterations were performed.
There were 18 predictors of which 18 had non-zero influence.
      RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.115729 0.9733434 0.9965503 0.9956153         0.9118642 0.9478331

Summary table of results

output_sum <- read.csv(here("data/brt/mod_outputs/brt_bckg_output_summary.csv"))
kableExtra::kable(output_sum)
model percent_explained calc_deviance TPR_mean TSS AUC RMSE SpearmanCor C.index PredRatio PseudoR2
base_0m_Nspat_Ntag 79.360 0.375 0.740 0.872 0.980 0.228 0.891 0.980 0.998 0.794
base_0m_Nspat_Ytag 91.990 0.182 0.759 0.957 0.994 0.147 0.957 0.994 0.994 0.920
base_0m_Yspat_Ytag 93.500 0.154 0.771 0.965 0.995 0.134 0.964 0.995 0.990 0.935
do_0m_Nspat_Ytag 93.820 0.131 0.769 0.976 0.997 0.120 0.972 0.997 0.998 0.938
do_0m_Yspat_Ytag 94.870 0.114 0.781 0.977 0.998 0.112 0.975 0.998 0.998 0.949
do_0m_60m_Nspat_Ytag 95.007 0.118 0.772 0.978 0.997 0.113 0.975 0.997 0.997 0.950
do_0m_250m_Nspat_Ytag 94.430 0.123 0.777 0.976 0.997 0.117 0.973 0.997 0.997 0.944
do_0m_60m_250m_Nspat_Ytag 95.380 0.113 0.778 0.979 0.997 0.111 0.975 0.997 0.997 0.954
do_0m_60m_250m_Yspat_Ytag 95.590 0.103 0.783 0.978 0.998 0.107 0.977 0.998 0.996 0.956
agi_0m_Nspat_Ytag 93.470 0.143 0.764 0.970 0.996 0.127 0.968 0.996 0.999 0.934
agi_0m_Yspat_Ytag 94.560 0.124 0.776 0.976 0.997 0.117 0.973 0.997 0.996 0.946
agi_0m_60m_Nspat_Ytag 94.111 0.131 0.764 0.971 0.997 0.121 0.971 0.997 0.998 0.941
agi_0m_250m_Nspat_Ytag 93.990 0.137 0.769 0.970 0.996 0.124 0.970 0.996 0.997 0.940
agi_0m_60m_250m_Nspat_Ytag 94.540 0.129 0.768 0.973 0.996 0.118 0.972 0.996 0.998 0.945
agi_0m_60m_250m_Yspat_Ytag 94.780 0.122 0.773 0.974 0.997 0.116 0.973 0.997 0.996 0.948
ggplot(output_sum, aes(AUC, TSS, color = percent_explained, label = model)) + 
  geom_point(size = 3) +
  xlab('AUC') +
  ylab('TSS') +
  scale_color_gradientn(colors = MetBrewer::met.brewer("Greek")) +
  ggrepel::geom_label_repel(aes(label = model),
                  box.padding   = 0.35, 
                  point.padding = 0.5,
                  segment.color = 'grey50')

DO models w/o tag ID

Here, I have run the same models as above, but without tag ID as a predictor variable. For this chunk of models, I am interested in identifying the role that dissolved oxygen may play in habitat suitability predictions, and how its relative importance compares to other covariates that are typically included in SDMs. Additionally, as BRTs are nonparametric, it is not critical or necessary for tag ID to be included.

do_0m_Nspat_Ntag <- readRDS(brt_outputs_Ntag[12])

# model performance 
ggBRT::ggPerformance(do_0m_Nspat_Ntag)
                     Model 1
Total.Deviance     1.3862934
Residual.Deviance  0.2188393
Correlation        0.9490943
AUC                0.9966000
Per.Expl          84.2140724
cvDeviance         0.5170811
cvCorrelation      0.8339445
cvAUC              0.9588700
cvPer.Expl        62.7004570
#relative influence of predictors
ggBRT::ggInfluence(do_0m_Nspat_Ntag) 

             rel.inf
bathy_mean 36.618282
o2_mean_0m 29.909294
temp_mean   7.873891
chl_mean    5.305086
ssh_mean    4.223122
sal_mean    3.004185
vostr_mean  2.838490
mld_mean    2.084183
bathy_sd    1.974745
uostr_mean  1.913768
uo_mean     1.739428
vo_mean     1.407480
pred_var    1.108047
#explore partial plots
gbm.plot(do_0m_Nspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_Nspat_Ntag)
do_int$rank.list
  var1.index var1.names var2.index var2.names int.size
1          3  temp_mean          1 o2_mean_0m   689.53
2         11 bathy_mean          4   sal_mean   592.77
3         11 bathy_mean          9   ssh_mean   576.98
4         11 bathy_mean          5    uo_mean   442.14
5          9   ssh_mean          3  temp_mean   385.11
6         11 bathy_mean          3  temp_mean   378.70
7         11 bathy_mean          1 o2_mean_0m   266.27
8         12   bathy_sd          1 o2_mean_0m   255.59
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_Nspat_Ntag, dat_test_do,
                     n.trees = do_0m_Nspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.3163543
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7429917
max(e@TPR + e@TNR -1) #TSS
[1] 0.8988065
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_Nspat_Ntag)
[1] 84.21407
#eval 75/25
eval_7525_modified(do_0m_Nspat_Ntag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4650 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2078864 0.9108071 0.9855748 0.9981125         0.7717969 0.8421407
do_0m_Yspat_Ntag <- readRDS(brt_outputs_Ntag[13])

# model performance 
ggBRT::ggPerformance(do_0m_Yspat_Ntag)
                     Model 1
Total.Deviance     1.3862934
Residual.Deviance  0.1949867
Correlation        0.9556768
AUC                0.9975000
Per.Expl          85.9346757
cvDeviance         0.4783362
cvCorrelation      0.8485144
cvAUC              0.9643500
cvPer.Expl        65.4953096
#relative influence of predictors
ggBRT::ggInfluence(do_0m_Yspat_Ntag) 

              rel.inf
dist_coast 53.5696625
o2_mean_0m 12.6309567
lat         8.1195782
bathy_mean  5.6664282
chl_mean    3.8500282
temp_mean   2.9692000
sal_mean    2.4553503
ssh_mean    2.1991953
vostr_mean  1.6248422
mld_mean    1.5073425
uo_mean     1.3556893
bathy_sd    1.2144774
uostr_mean  1.0434857
vo_mean     0.9699671
pred_var    0.8237967
#explore partial plots
gbm.plot(do_0m_Yspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_Yspat_Ntag)
do_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1          12 bathy_mean          4  temp_mean   969.31
2           4  temp_mean          2 o2_mean_0m   582.13
3          12 bathy_mean          1        lat   295.21
4          12 bathy_mean         10   ssh_mean   220.95
5           3   chl_mean          2 o2_mean_0m   208.59
6           4  temp_mean          1        lat   197.51
7          12 bathy_mean          6    uo_mean   180.50
8          13   bathy_sd          2 o2_mean_0m   168.48
9           3   chl_mean          1        lat   143.75
10         14 dist_coast          5   sal_mean   142.09
11         12 bathy_mean          5   sal_mean   139.44
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_Yspat_Ntag, dat_test_do,
                     n.trees = do_0m_Yspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.2860822
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7444084
max(e@TPR + e@TNR -1) #TSS
[1] 0.9068106
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_Yspat_Ntag)
[1] 85.93468
#eval 75/25
eval_7525_modified(do_0m_Yspat_Ntag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4500 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1961425 0.9209367 0.9877846 0.9968631         0.7936337 0.8593468
do_0m_60m_Nspat_Ntag <- readRDS(brt_outputs_Ntag[11])

# model performance 
ggBRT::ggPerformance(do_0m_60m_Nspat_Ntag)
                     Model 1
Total.Deviance     1.3862934
Residual.Deviance  0.2152171
Correlation        0.9494120
AUC                0.9963000
Per.Expl          84.4753535
cvDeviance         0.4980739
cvCorrelation      0.8417192
cvAUC              0.9613600
cvPer.Expl        64.0715342
#relative influence of predictors
ggBRT::ggInfluence(do_0m_60m_Nspat_Ntag) 

               rel.inf
bathy_mean  33.3337448
o2_mean_0m  29.2907706
o2_mean_60m 10.7154473
chl_mean     4.7874063
ssh_mean     4.4654220
temp_mean    4.2047433
sal_mean     2.8985582
vostr_mean   1.9514731
mld_mean     1.8318151
uo_mean      1.6194828
bathy_sd     1.5388993
uostr_mean   1.4127257
vo_mean      0.9997546
pred_var     0.9497570
#explore partial plots
gbm.plot(do_0m_60m_Nspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_60m_Nspat_Ntag)
do_int$rank.list
   var1.index  var1.names var2.index var2.names int.size
1           3   temp_mean          1 o2_mean_0m   587.36
2          11  bathy_mean          9   ssh_mean   529.13
3          13 o2_mean_60m          3  temp_mean   467.19
4          11  bathy_mean          5    uo_mean   395.13
5          11  bathy_mean          3  temp_mean   317.18
6          11  bathy_mean          4   sal_mean   312.00
7           6  uostr_mean          1 o2_mean_0m   296.24
8          12    bathy_sd          1 o2_mean_0m   223.69
9          13 o2_mean_60m          2   chl_mean   166.11
10         11  bathy_mean          1 o2_mean_0m   151.42
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_60m_Nspat_Ntag, dat_test_do,
                     n.trees = do_0m_60m_Nspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.3054413
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7433705
max(e@TPR + e@TNR -1) #TSS
[1] 0.9012114
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_60m_Nspat_Ntag)
[1] 84.47535
#eval 75/25
eval_7525_modified(do_0m_60m_Nspat_Ntag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4350 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2039139 0.9142042 0.9862436 0.9962925         0.7796691 0.8447535
do_0m_250m_Nspat_Ntag <- readRDS(brt_outputs_Ntag[8])

# model performance 
ggBRT::ggPerformance(do_0m_250m_Nspat_Ntag)
                     Model 1
Total.Deviance     1.3862934
Residual.Deviance  0.2335030
Correlation        0.9429315
AUC                0.9954000
Per.Expl          83.1563108
cvDeviance         0.5005420
cvCorrelation      0.8398630
cvAUC              0.9608000
cvPer.Expl        63.8935019
#relative influence of predictors
ggBRT::ggInfluence(do_0m_250m_Nspat_Ntag) 

                rel.inf
bathy_mean   30.0267521
o2_mean_0m   29.7310875
o2_mean_250m 17.3780027
temp_mean     3.9785508
chl_mean      3.7684071
sal_mean      2.9143743
ssh_mean      2.5775664
bathy_sd      1.7071494
vostr_mean    1.6822884
uo_mean       1.4379251
uostr_mean    1.4253686
mld_mean      1.3330125
vo_mean       1.1026202
pred_var      0.9368949
#explore partial plots
gbm.plot(do_0m_250m_Nspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_250m_Nspat_Ntag)
do_int$rank.list
   var1.index   var1.names var2.index var2.names int.size
1          11   bathy_mean          4   sal_mean   334.54
2           3    temp_mean          1 o2_mean_0m   330.27
3          11   bathy_mean          9   ssh_mean   277.86
4          13 o2_mean_250m          1 o2_mean_0m   276.54
5           6   uostr_mean          1 o2_mean_0m   255.46
6          11   bathy_mean          3  temp_mean   207.76
7           2     chl_mean          1 o2_mean_0m   190.07
8          12     bathy_sd          1 o2_mean_0m   186.35
9          13 o2_mean_250m          4   sal_mean   156.17
10         13 o2_mean_250m          3  temp_mean   151.82
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_250m_Nspat_Ntag, dat_test_do,
                     n.trees = do_0m_250m_Nspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.3233148
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7424683
max(e@TPR + e@TNR -1) #TSS
[1] 0.8898093
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_250m_Nspat_Ntag)
[1] 83.15631
#eval 75/25
eval_7525_modified(do_0m_250m_Nspat_Ntag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
3850 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2116599 0.9071004 0.9846599 0.9967308         0.7667759 0.8315631
do_0m_60m_250m_Nspat_Ntag <- readRDS(brt_outputs_Ntag[9])

# model performance 
ggBRT::ggPerformance(do_0m_60m_250m_Nspat_Ntag)
                     Model 1
Total.Deviance     1.3862934
Residual.Deviance  0.1967857
Correlation        0.9555813
AUC                0.9975000
Per.Expl          85.8049005
cvDeviance         0.4879500
cvCorrelation      0.8446379
cvAUC              0.9630100
cvPer.Expl        64.8018226
#relative influence of predictors
ggBRT::ggInfluence(do_0m_60m_250m_Nspat_Ntag) 

               rel.inf
bathy_mean   29.784726
o2_mean_0m   29.575237
o2_mean_250m 12.535092
o2_mean_60m   6.692749
chl_mean      3.607960
temp_mean     3.236009
sal_mean      2.956732
ssh_mean      2.508817
vostr_mean    1.606424
bathy_sd      1.536286
uo_mean       1.440651
uostr_mean    1.322227
mld_mean      1.317766
vo_mean       1.024736
pred_var      0.854590
#explore partial plots
gbm.plot(do_0m_60m_250m_Nspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_60m_250m_Nspat_Ntag)
do_int$rank.list
   var1.index   var1.names var2.index  var2.names int.size
1           3    temp_mean          1  o2_mean_0m   575.57
2           2     chl_mean          1  o2_mean_0m   367.59
3          11   bathy_mean          4    sal_mean   247.44
4          11   bathy_mean          5     uo_mean   244.71
5          13  o2_mean_60m          3   temp_mean   233.27
6          11   bathy_mean          9    ssh_mean   222.57
7          11   bathy_mean          3   temp_mean   204.95
8          14 o2_mean_250m         13 o2_mean_60m   192.80
9          14 o2_mean_250m          1  o2_mean_0m   184.47
10         13  o2_mean_60m         11  bathy_mean   168.43
11         12     bathy_sd          1  o2_mean_0m   167.06
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_60m_250m_Nspat_Ntag, dat_test_do,
                     n.trees = do_0m_60m_250m_Nspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.2915978
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7440062
max(e@TPR + e@TNR -1) #TSS
[1] 0.9065458
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_60m_250m_Nspat_Ntag)
[1] 85.8049
#eval 75/25
eval_7525_modified(do_0m_60m_250m_Nspat_Ntag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4550 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained PseudoR2
1 0.1981623 0.9191668 0.9872286  0.999063         0.7896551 0.858049
do_0m_60m_250m_Yspat_Ntag <- readRDS(brt_outputs_Ntag[10])

# model performance 
ggBRT::ggPerformance(do_0m_60m_250m_Yspat_Ntag)
                     Model 1
Total.Deviance     1.3862934
Residual.Deviance  0.1711730
Correlation        0.9637236
AUC                0.9984000
Per.Expl          87.6524677
cvDeviance         0.4688544
cvCorrelation      0.8531779
cvAUC              0.9656700
cvPer.Expl        66.1792818
#relative influence of predictors
ggBRT::ggInfluence(do_0m_60m_250m_Yspat_Ntag) 

                rel.inf
dist_coast   51.2057041
o2_mean_0m   11.5940030
o2_mean_250m  7.6458020
lat           4.9904405
bathy_mean    3.8441712
o2_mean_60m   3.4353426
chl_mean      3.1827866
temp_mean     2.7383315
sal_mean      2.2099326
ssh_mean      1.9054953
mld_mean      1.3048380
vostr_mean    1.1565039
uo_mean       1.1379053
bathy_sd      1.1340787
uostr_mean    0.9609815
vo_mean       0.7913691
pred_var      0.7623142
#explore partial plots
gbm.plot(do_0m_60m_250m_Yspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
do_int <- gbm.interactions(do_0m_60m_250m_Yspat_Ntag)
do_int$rank.list
   var1.index   var1.names var2.index var2.names int.size
1          12   bathy_mean          4  temp_mean   460.19
2          12   bathy_mean          6    uo_mean   369.18
3           4    temp_mean          2 o2_mean_0m   294.93
4          15  o2_mean_60m          4  temp_mean   221.21
5           3     chl_mean          2 o2_mean_0m   213.95
6          16 o2_mean_250m          1        lat   194.30
7          12   bathy_mean          1        lat   176.52
8          12   bathy_mean          5   sal_mean   174.94
9          15  o2_mean_60m          5   sal_mean   157.19
10         12   bathy_mean          3   chl_mean   136.43
11          3     chl_mean          1        lat   135.67
12         15  o2_mean_60m         12 bathy_mean   121.03
13          7   uostr_mean          2 o2_mean_0m   114.49
14         12   bathy_mean         10   ssh_mean   109.97
#predictive performance using test dataset 
preds <- predict.gbm(do_0m_60m_250m_Yspat_Ntag, dat_test_do,
                     n.trees = do_0m_60m_250m_Yspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_do$PA, preds) #get % deviance
[1] 0.2611485
dat_pred_do <- cbind(dat_test_do$PA, preds)
pres_do <- dat_pred_do[dat_pred_do[,1] == 1, 2]
abs_do <- dat_pred_do[dat_pred_do[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_do, a = abs_do)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7455318
max(e@TPR + e@TNR -1) #TSS
[1] 0.9219847
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(do_0m_60m_250m_Yspat_Ntag)
[1] 87.65247
#eval 75/25
eval_7525_modified(do_0m_60m_250m_Yspat_Ntag, 
                 testInput = dat_test_do, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4850 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1850589 0.9301022 0.9896335 0.9972148         0.8116198 0.8765247

AGI models w/o tag ID

Here, I have run the same models as above, but without tag ID as a predictor variable. For this chunk of models, I am interested in identifying the role that AGI may play in habitat suitability predictions, and how its relative importance compares to other covariates that are typically included in SDMs. Additionally, as BRTs are nonparametric, it is not critical or necessary for tag ID to be included.

agi_0m_Nspat_Ntag <- readRDS(brt_outputs_Ntag[5])

# model performance 
ggBRT::ggPerformance(agi_0m_Nspat_Ntag)
                     Model 1
Total.Deviance     1.3862490
Residual.Deviance  0.2494636
Correlation        0.9389402
AUC                0.9946000
Per.Expl          82.0044167
cvDeviance         0.5227643
cvCorrelation      0.8310463
cvAUC              0.9575400
cvPer.Expl        62.2892911
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_Nspat_Ntag) 

             rel.inf
bathy_mean 36.491740
temp_mean  21.375886
AGI_0m     10.198606
sal_mean    5.377754
ssh_mean    5.311421
uostr_mean  5.009516
chl_mean    4.871749
vostr_mean  3.562961
bathy_sd    2.025273
mld_mean    1.700623
uo_mean     1.632024
vo_mean     1.472134
pred_var    0.970312
#explore partial plots
gbm.plot(agi_0m_Nspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_Nspat_Ntag)
agi_int$rank.list
  var1.index var1.names var2.index var2.names int.size
1         12     AGI_0m          2  temp_mean  6302.78
2         10 bathy_mean          2  temp_mean   752.63
3         10 bathy_mean          8   ssh_mean   463.53
4         12     AGI_0m          8   ssh_mean   447.89
5          5 uostr_mean          2  temp_mean   358.75
6         10 bathy_mean          3   sal_mean   358.05
7         10 bathy_mean          4    uo_mean   261.61
8          8   ssh_mean          4    uo_mean   253.56
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_Nspat_Ntag, dat_test_agi,
                     n.trees = agi_0m_Nspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.3265073
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7424835
max(e@TPR + e@TNR -1) #TSS
[1] 0.8904318
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_Nspat_Ntag)
[1] 82.00442
#eval 75/25
eval_7525_modified(agi_0m_Nspat_Ntag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4100 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2116179 0.9077497 0.9853149 0.9994755         0.7644717 0.8200442
agi_0m_Yspat_Ntag <- readRDS(brt_outputs_Ntag[6])

# model performance 
ggBRT::ggPerformance(agi_0m_Yspat_Ntag)
                     Model 1
Total.Deviance     1.3862490
Residual.Deviance  0.2061652
Correlation        0.9520613
AUC                0.9968000
Per.Expl          85.1278375
cvDeviance         0.4844196
cvCorrelation      0.8457186
cvAUC              0.9631500
cvPer.Expl        65.0553696
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_Yspat_Ntag) 

              rel.inf
dist_coast 53.2176953
lat        10.6179286
AGI_0m      7.1940580
bathy_mean  6.9716928
temp_mean   5.2270863
chl_mean    3.4956185
sal_mean    2.7812823
ssh_mean    2.0438267
uo_mean     1.4980073
mld_mean    1.4534671
vostr_mean  1.4445966
uostr_mean  1.1261180
bathy_sd    1.0472647
vo_mean     1.0242153
pred_var    0.8571425
#explore partial plots
gbm.plot(agi_0m_Yspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_Yspat_Ntag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1          13     AGI_0m          3  temp_mean  2156.60
2          13     AGI_0m          1        lat   801.90
3           3  temp_mean          1        lat   588.20
4          11 bathy_mean          3  temp_mean   459.87
5          14 dist_coast         10   mld_mean   407.88
6          13     AGI_0m         11 bathy_mean   278.43
7          11 bathy_mean          5    uo_mean   268.20
8          11 bathy_mean          2   chl_mean   191.28
9          11 bathy_mean          1        lat   162.84
10         14 dist_coast          4   sal_mean   158.85
11          6 uostr_mean          1        lat   153.97
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_Yspat_Ntag, dat_test_agi,
                     n.trees = agi_0m_Yspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.2853753
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7446268
max(e@TPR + e@TNR -1) #TSS
[1] 0.907256
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_Yspat_Ntag)
[1] 85.12784
#eval 75/25
eval_7525_modified(agi_0m_Yspat_Ntag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4400 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
      RMSE      Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.195868 0.921452 0.9884097 0.9994483         0.7941426 0.8512784
agi_0m_60m_Nspat_Ntag <- readRDS(brt_outputs_Ntag[4])

# model performance 
ggBRT::ggPerformance(agi_0m_60m_Nspat_Ntag)
                     Model 1
Total.Deviance     1.3862490
Residual.Deviance  0.2005887
Correlation        0.9556745
AUC                0.9975000
Per.Expl          85.5301104
cvDeviance         0.5068771
cvCorrelation      0.8393242
cvAUC              0.9601500
cvPer.Expl        63.4353465
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_60m_Nspat_Ntag) 

              rel.inf
bathy_mean 33.7808593
temp_mean  20.4921335
AGI_0m     10.1395119
uostr_mean  6.1395169
AGI_60m     5.7829887
sal_mean    4.5642329
chl_mean    4.3701423
ssh_mean    3.8515309
vostr_mean  3.6254884
bathy_sd    1.6481303
uo_mean     1.6317208
mld_mean    1.5744189
vo_mean     1.4637452
pred_var    0.9355801
#explore partial plots
gbm.plot(agi_0m_60m_Nspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_60m_Nspat_Ntag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1          12     AGI_0m          2  temp_mean  8122.33
2          10 bathy_mean          2  temp_mean   616.02
3          10 bathy_mean          8   ssh_mean   594.41
4          12     AGI_0m          8   ssh_mean   501.98
5          13    AGI_60m         10 bathy_mean   480.31
6          10 bathy_mean          3   sal_mean   340.70
7           5 uostr_mean          2  temp_mean   239.76
8           9   mld_mean          6    vo_mean   198.10
9          13    AGI_60m          1   chl_mean   194.10
10         13    AGI_60m          3   sal_mean   172.10
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_60m_Nspat_Ntag, dat_test_agi,
                     n.trees = agi_0m_60m_Nspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.2889722
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7442668
max(e@TPR + e@TNR -1) #TSS
[1] 0.9016135
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_60m_Nspat_Ntag)
[1] 85.53011
#eval 75/25
eval_7525_modified(agi_0m_60m_Nspat_Ntag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5000 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1969633 0.9206625 0.9883319  1.001621         0.7915479 0.8553011
agi_0m_250m_Nspat_Ntag <- readRDS(brt_outputs_Ntag[1])

# model performance 
ggBRT::ggPerformance(agi_0m_250m_Nspat_Ntag)
                     Model 1
Total.Deviance     1.3862490
Residual.Deviance  0.2101199
Correlation        0.9516769
AUC                0.9969000
Per.Expl          84.8425585
cvDeviance         0.5009849
cvCorrelation      0.8408802
cvAUC              0.9607700
cvPer.Expl        63.8603934
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_250m_Nspat_Ntag) 

              rel.inf
bathy_mean 29.6981397
temp_mean  19.9110798
AGI_250m   13.1371225
AGI_0m      8.8212041
uostr_mean  6.9872820
sal_mean    4.6639715
ssh_mean    4.1044072
chl_mean    3.5367397
vostr_mean  2.1325830
bathy_sd    1.7771055
uo_mean     1.4966185
mld_mean    1.4421522
vo_mean     1.3513807
pred_var    0.9402136
#explore partial plots
gbm.plot(agi_0m_250m_Nspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_250m_Nspat_Ntag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1          12     AGI_0m          2  temp_mean  7413.64
2          12     AGI_0m          8   ssh_mean   379.59
3           8   ssh_mean          4    uo_mean   376.09
4          13   AGI_250m          2  temp_mean   374.64
5          10 bathy_mean          2  temp_mean   352.02
6          13   AGI_250m         10 bathy_mean   349.05
7          10 bathy_mean          3   sal_mean   343.47
8          10 bathy_mean          4    uo_mean   264.93
9           4    uo_mean          2  temp_mean   209.75
10          5 uostr_mean          1   chl_mean   166.77
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_250m_Nspat_Ntag, dat_test_agi,
                     n.trees = agi_0m_250m_Nspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.2950992
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7440749
max(e@TPR + e@TNR -1) #TSS
[1] 0.9009173
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_250m_Nspat_Ntag)
[1] 84.84256
#eval 75/25
eval_7525_modified(agi_0m_250m_Nspat_Ntag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4600 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2000762 0.9178963 0.9878278 0.9987535         0.7871282 0.8484256
agi_0m_60m_250m_Nspat_Ntag <- readRDS(brt_outputs_Ntag[2])

# model performance 
ggBRT::ggPerformance(agi_0m_60m_250m_Nspat_Ntag)
                     Model 1
Total.Deviance     1.3862490
Residual.Deviance  0.1855561
Correlation        0.9598511
AUC                0.9980000
Per.Expl          86.6145154
cvDeviance         0.4907170
cvCorrelation      0.8451783
cvAUC              0.9623500
cvPer.Expl        64.6010907
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_60m_250m_Nspat_Ntag) 

              rel.inf
bathy_mean 29.9734289
temp_mean  20.3673209
AGI_250m   12.2941229
AGI_0m      8.4707152
uostr_mean  6.0490757
sal_mean    3.9485657
AGI_60m     3.5161350
chl_mean    3.5092053
ssh_mean    3.0434925
vostr_mean  2.0985932
bathy_sd    1.6409157
vo_mean     1.4220106
uo_mean     1.4121966
mld_mean    1.4016687
pred_var    0.8525533
#explore partial plots
gbm.plot(agi_0m_60m_250m_Nspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_60m_250m_Nspat_Ntag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1          12     AGI_0m          2  temp_mean  5744.72
2          12     AGI_0m          8   ssh_mean   599.57
3          10 bathy_mean          3   sal_mean   441.15
4          10 bathy_mean          2  temp_mean   382.86
5          14   AGI_250m          2  temp_mean   328.55
6          13    AGI_60m         10 bathy_mean   279.25
7          14   AGI_250m         10 bathy_mean   268.58
8          10 bathy_mean          8   ssh_mean   257.93
9          10 bathy_mean          4    uo_mean   232.67
10         12     AGI_0m          3   sal_mean   185.87
11         12     AGI_0m         10 bathy_mean   173.22
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_60m_250m_Nspat_Ntag, dat_test_agi,
                     n.trees = agi_0m_60m_250m_Nspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.2664466
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7453912
max(e@TPR + e@TNR -1) #TSS
[1] 0.9115428
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_60m_250m_Nspat_Ntag)
[1] 86.61452
#eval 75/25
eval_7525_modified(agi_0m_60m_250m_Nspat_Ntag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5050 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE      Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1883042 0.927851 0.9903983  1.000388         0.8077969 0.8661452
agi_0m_60m_250m_Yspat_Ntag <- readRDS(brt_outputs_Ntag[3])

# model performance 
ggBRT::ggPerformance(agi_0m_60m_250m_Yspat_Ntag)
                     Model 1
Total.Deviance     1.3862490
Residual.Deviance  0.1679933
Correlation        0.9648664
AUC                0.9986000
Per.Expl          87.8814450
cvDeviance         0.4631752
cvCorrelation      0.8543076
cvAUC              0.9664000
cvPer.Expl        66.5878820
#relative influence of predictors
ggBRT::ggInfluence(agi_0m_60m_250m_Yspat_Ntag) 

              rel.inf
dist_coast 51.3624256
lat         9.3914556
AGI_0m      6.7567291
bathy_mean  5.5060080
temp_mean   4.8081540
AGI_250m    4.4615978
chl_mean    3.1022312
sal_mean    3.0090800
AGI_60m     2.3105716
ssh_mean    1.7128404
vostr_mean  1.2870285
mld_mean    1.2286325
uo_mean     1.2228660
uostr_mean  1.1546359
vo_mean     0.9892946
bathy_sd    0.9517202
pred_var    0.7447290
#explore partial plots
gbm.plot(agi_0m_60m_250m_Yspat_Ntag, nplots = 10, plot.layout = c(3,5), write.title = FALSE) 

#find the 5 most important pairwise interactions 
agi_int <- gbm.interactions(agi_0m_60m_250m_Yspat_Ntag)
agi_int$rank.list
   var1.index var1.names var2.index var2.names int.size
1          13     AGI_0m          3  temp_mean  2706.06
2          13     AGI_0m          1        lat   480.87
3          13     AGI_0m         11 bathy_mean   300.50
4           6 uostr_mean          1        lat   293.87
5          11 bathy_mean          2   chl_mean   273.68
6          11 bathy_mean          3  temp_mean   269.12
7          16   AGI_250m         11 bathy_mean   265.48
8          14 dist_coast         10   mld_mean   219.03
9          11 bathy_mean          9   ssh_mean   209.13
10         13     AGI_0m          9   ssh_mean   176.75
11         11 bathy_mean          4   sal_mean   164.47
12          3  temp_mean          1        lat   147.70
13         11 bathy_mean          5    uo_mean   125.32
14         11 bathy_mean          1        lat   120.91
#predictive performance using test dataset 
preds <- predict.gbm(agi_0m_60m_250m_Yspat_Ntag, dat_test_agi,
                     n.trees = agi_0m_60m_250m_Yspat_Ntag$gbm.call$best.trees,
                     type = "response")

calc.deviance(obs = dat_test_agi$PA, preds) #get % deviance
[1] 0.2466946
dat_pred_agi <- cbind(dat_test_agi$PA, preds)
pres_agi <- dat_pred_agi[dat_pred_agi[,1] == 1, 2]
abs_agi <- dat_pred_agi[dat_pred_agi[,1] == 0, 2]

#evaluate (AUC, TSS, TPR)
e = evaluate(p = pres_agi, a = abs_agi)
plot(e, 'TPR')

plot(e, 'TNR')

plot(e, 'ROC')

boxplot(e)

density(e)

mean(e@TPR) #TPR
[1] 0.7463962
max(e@TPR + e@TNR -1) #TSS
[1] 0.9225519
#provides % deviance for model selection - unsure if this is different from above calc.deviance
dev_eval_brt(agi_0m_60m_250m_Yspat_Ntag)
[1] 87.88144
#eval 75/25
eval_7525_modified(agi_0m_60m_250m_Yspat_Ntag, 
                 testInput = dat_test_agi, 
                 gbm.y = "PA")
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5050 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1793447 0.9348399 0.9915013 0.9995818         0.8220452 0.8788144

Summary table of results

output_sum_Ntag <- read.csv(here("data/brt/mod_outputs/brt_bckg_output_summary_Ntag.csv"))
kableExtra::kable(output_sum_Ntag)
model percent_explained calc_deviance TPR_mean TSS AUC RMSE SpearmanCor C.index PredRatio PseudoR2
base_0m_Nspat_Ntag 79.360 0.375 0.740 0.872 0.980 0.228 0.891 0.980 0.999 0.794
do_0m_Nspat_Ntag 84.210 0.316 0.743 0.899 0.986 0.208 0.911 0.986 0.998 0.842
do_0m_Yspat_Ntag 85.930 0.286 0.744 0.907 0.988 0.196 0.921 0.988 0.997 0.859
do_0m_60m_Nspat_Ntag 84.480 0.305 0.743 0.901 0.986 0.204 0.914 0.986 0.996 0.844
do_0m_250m_Nspat_Ntag 83.160 0.323 0.742 0.890 0.985 0.212 0.907 0.985 0.997 0.832
do_0m_60m_250m_Nspat_Ntag 85.800 0.292 0.744 0.907 0.987 0.198 0.919 0.987 0.999 0.858
do_0m_60m_250m_Yspat_Ntag 87.650 0.261 0.745 0.922 0.990 0.185 0.930 0.990 0.997 0.877
agi_0m_Nspat_Ntag 82.000 0.327 0.742 0.890 0.985 0.212 0.908 0.985 0.999 0.820
agi_0m_Yspat_Ntag 85.130 0.285 0.745 0.907 0.988 0.196 0.921 0.988 0.999 0.851
agi_0m_60m_Nspat_Ntag 85.530 0.289 0.744 0.902 0.988 0.197 0.921 0.988 1.000 0.855
agi_0m_250m_Nspat_Ntag 84.843 0.295 0.744 0.901 0.988 0.200 0.918 0.988 0.999 0.848
agi_0m_60m_250m_Nspat_Ntag 86.610 0.266 0.745 0.912 0.990 0.188 0.928 0.990 1.000 0.866
agi_0m_60m_250m_Yspat_Ntag 87.880 0.247 0.746 0.923 0.992 0.179 0.935 0.992 1.000 0.879
ggplot(output_sum_Ntag, aes(AUC, TSS, color = percent_explained, label = model)) + 
  geom_point(size = 3) +
  xlab('AUC') +
  ylab('TSS') +
  scale_color_gradientn(colors = MetBrewer::met.brewer("Greek")) +
  ggrepel::geom_label_repel(aes(label = model),
                  box.padding   = 0.35, 
                  point.padding = 0.5,
                  segment.color = 'grey50')

Base models w/o tag ID and w/ data at seasonal and annual resolutions

For these models, the environmental raster data was averaged according to season and year. Observed and pseudo absence locations were then used for environmental data extraction along these raster files and were matched to each file according to either the season or year.

explore_brt(mod_file_path = "data/brt/mod_outputs/background/seasonal/brt_base_0m_seas_Nspat_Ntag.rds",
            test_data = dat_test_base_s)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862930
Residual.Deviance  0.3717943
Correlation        0.8912655
AUC                0.9811000
Per.Expl          73.1806827
cvDeviance         0.5434223
cvCorrelation      0.8203757
cvAUC              0.9544500
cvPer.Expl        60.8003302
[1] "Relative influence of predictor variables"

              rel.inf
vo_mean    37.6115543
vostr_mean 12.9884480
bathy_mean  9.2018678
uostr_mean  9.0931029
ssh_mean    8.2555616
sal_mean    6.5148137
temp_mean   5.1868409
mld_mean    4.0654653
chl_mean    2.9151965
uo_mean     2.0919827
bathy_sd    1.2334844
pred_var    0.8416821
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index var1.names var2.index var2.names int.size
1          2   sal_mean          1   mld_mean   907.84
2         10 bathy_mean          6 uostr_mean   469.57
3          8 vostr_mean          4  temp_mean   379.84
4         10 bathy_mean          2   sal_mean   261.59
5          7    vo_mean          5    uo_mean   219.43
6          3   ssh_mean          2   sal_mean   152.45
7          8 vostr_mean          7    vo_mean   141.19
[1] "Calculated percent deviance"
[1] 0.4192523

[1] "TPR"
[1] 0.7364798
[1] "TSS"
[1] 0.8396551
[1] "Percent deviance calculated by in house functions"
[1] 73.18068
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4500 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2454028 0.8722718 0.9737081  1.003879         0.6974331 0.7318068
explore_brt(mod_file_path = "data/brt/mod_outputs/background/annual/brt_base_0m_ann_Nspat_Ntag.rds",
            test_data = dat_test_base_a)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862892
Residual.Deviance  0.3209531
Correlation        0.9119722
AUC                0.9875000
Per.Expl          76.8480398
cvDeviance         0.5356092
cvCorrelation      0.8241442
cvAUC              0.9553500
cvPer.Expl        61.3638176
[1] "Relative influence of predictor variables"

              rel.inf
vo_mean    38.8075023
vostr_mean 16.2673317
uostr_mean 11.4492219
bathy_mean 10.1191174
chl_mean    5.6156607
sal_mean    4.5053210
temp_mean   3.5987576
ssh_mean    2.6590283
mld_mean    2.5230017
uo_mean     2.0857363
bathy_sd    1.4172504
pred_var    0.9520707
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index var1.names var2.index var2.names int.size
1          8 vostr_mean          6 uostr_mean  1285.85
2          3   ssh_mean          1   mld_mean   541.81
3          8 vostr_mean          3   ssh_mean   540.45
4          7    vo_mean          6 uostr_mean   534.47
5          8 vostr_mean          1   mld_mean   376.41
6          8 vostr_mean          4  temp_mean   354.21
7         10 bathy_mean          8 vostr_mean   334.03
[1] "Calculated percent deviance"
[1] 0.3774674

[1] "TPR"
[1] 0.7394418
[1] "TSS"
[1] 0.8624481
[1] "Percent deviance calculated by in house functions"
[1] 76.84804
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
6300 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2307242 0.8885542 0.9793393  1.004532         0.7277093 0.7684804

DO models w/o tag ID and w/ data at seasonal and annual resolutions

explore_brt(mod_file_path = "data/brt/mod_outputs/background/seasonal/brt_do_0m_60m_250m_seas_Nspat_Ntag.rds",
            test_data = dat_test_do_all)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862840
Residual.Deviance  0.2418172
Correlation        0.9401701
AUC                0.9946000
Per.Expl          82.5564495
cvDeviance         0.4926303
cvCorrelation      0.8411094
cvAUC              0.9618100
cvPer.Expl        64.4639732
[1] "Relative influence of predictor variables"

                     rel.inf
o2_mean_0m_seas   29.9550272
bathy_mean        28.1716202
o2_mean_250m_seas 13.6726213
o2_mean_60m_seas   7.3518430
chl_mean           3.5030980
ssh_mean           3.2048426
temp_mean          3.0233286
sal_mean           2.5361519
bathy_sd           1.5804684
uostr_mean         1.4283298
mld_mean           1.3660243
vostr_mean         1.2743367
uo_mean            1.2458839
vo_mean            0.9471799
pred_var           0.7392442
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index        var1.names var2.index var2.names int.size
1          14  o2_mean_60m_seas          3   sal_mean   286.26
2          10        bathy_mean          4    uo_mean   245.71
3          13   o2_mean_0m_seas          8   ssh_mean   230.00
4          15 o2_mean_250m_seas         10 bathy_mean   227.68
5          10        bathy_mean          1   chl_mean   221.03
6          10        bathy_mean          3   sal_mean   201.17
7          14  o2_mean_60m_seas         10 bathy_mean   188.40
8          13   o2_mean_0m_seas          2  temp_mean   171.51
9          13   o2_mean_0m_seas         10 bathy_mean   157.14
10         14  o2_mean_60m_seas          2  temp_mean   154.23
11         10        bathy_mean          8   ssh_mean   138.67
[1] "Calculated percent deviance"
[1] 0.3103337

[1] "TPR"
[1] 0.7433213
[1] "TSS"
[1] 0.8939959
[1] "Percent deviance calculated by in house functions"
[1] 82.55645
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7100 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2061117 0.9125669 0.9864388  1.001678         0.7761335 0.8255645
explore_brt(mod_file_path = "data/brt/mod_outputs/background/seasonal/brt_do_0m_60m_250m_seas_Yspat_Ntag.rds",
            test_data = dat_test_do_all)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862840
Residual.Deviance  0.2287664
Correlation        0.9445079
AUC                0.9954000
Per.Expl          83.4978693
cvDeviance         0.4764336
cvCorrelation      0.8476000
cvAUC              0.9640900
cvPer.Expl        65.6323258
[1] "Relative influence of predictor variables"

                     rel.inf
dist_coast        51.6082639
o2_mean_0m_seas   11.3961720
o2_mean_250m_seas  7.7265238
o2_mean_60m_seas   4.5488533
bathy_mean         3.9480523
lat                3.9238771
chl_mean           3.0456122
temp_mean          2.7586462
sal_mean           2.2400707
ssh_mean           1.9081300
mld_mean           1.2771601
vostr_mean         1.1909067
uo_mean            1.1179762
bathy_sd           0.9683763
uostr_mean         0.8761205
vo_mean            0.8223314
pred_var           0.6429273
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index        var1.names var2.index var2.names int.size
1          17 o2_mean_250m_seas         11 bathy_mean   272.35
2          11        bathy_mean          3  temp_mean   270.80
3          11        bathy_mean          2   chl_mean   265.35
4           6        uostr_mean          1        lat   196.73
5          16  o2_mean_60m_seas         11 bathy_mean   181.67
6          15   o2_mean_0m_seas          1        lat   175.06
7          16  o2_mean_60m_seas          4   sal_mean   175.02
8          15   o2_mean_0m_seas          3  temp_mean   167.89
9          11        bathy_mean          4   sal_mean   158.34
10         11        bathy_mean          5    uo_mean   157.94
11         11        bathy_mean          9   ssh_mean   144.56
12         16  o2_mean_60m_seas          3  temp_mean   142.99
13         13        dist_coast         10   mld_mean   142.57
14         13        dist_coast          8 vostr_mean   130.50
[1] "Calculated percent deviance"
[1] 0.2975476

[1] "TPR"
[1] 0.744045
[1] "TSS"
[1] 0.8977673
[1] "Percent deviance calculated by in house functions"
[1] 83.49787
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7200 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE      Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2013708 0.916764 0.9875883  1.000038          0.785357 0.8349787
explore_brt(mod_file_path = "data/brt/mod_outputs/background/annual/brt_do_0m_60m_250m_ann_Nspat_Ntag.rds",
            test_data = dat_test_do_all)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862840
Residual.Deviance  0.2658281
Correlation        0.9337458
AUC                0.9934000
Per.Expl          80.8244119
cvDeviance         0.5297688
cvCorrelation      0.8266086
cvAUC              0.9564300
cvPer.Expl        61.7849717
[1] "Relative influence of predictor variables"

                    rel.inf
bathy_mean       27.9042605
o2_mean_0m_ann   22.5395832
o2_mean_250m_ann 14.4483395
temp_mean         7.5881348
o2_mean_60m_ann   6.8540577
chl_mean          4.2914489
sal_mean          3.5883053
ssh_mean          2.8955778
uostr_mean        1.9748209
vostr_mean        1.6451343
mld_mean          1.5488336
bathy_sd          1.4989186
uo_mean           1.4097996
vo_mean           1.0829564
pred_var          0.7298289
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index       var1.names var2.index     var2.names int.size
1          14  o2_mean_60m_ann          2      temp_mean   391.78
2          10       bathy_mean          4        uo_mean   212.96
3          10       bathy_mean          2      temp_mean   170.12
4          10       bathy_mean          1       chl_mean   153.06
5          14  o2_mean_60m_ann         10     bathy_mean   151.00
6          10       bathy_mean          3       sal_mean   147.84
7          10       bathy_mean          8       ssh_mean   145.21
8           8         ssh_mean          5     uostr_mean   126.87
9          15 o2_mean_250m_ann         13 o2_mean_0m_ann   126.07
10         14  o2_mean_60m_ann         13 o2_mean_0m_ann   111.62
11         15 o2_mean_250m_ann          2      temp_mean   111.47
[1] "Calculated percent deviance"
[1] 0.3318456

[1] "TPR"
[1] 0.7424093
[1] "TSS"
[1] 0.8858185
[1] "Percent deviance calculated by in house functions"
[1] 80.82441
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7200 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2135096 0.9062035 0.9851365  1.000257         0.7606153 0.8082441
explore_brt(mod_file_path = "data/brt/mod_outputs/background/annual/brt_do_0m_60m_250m_ann_Yspat_Ntag.rds",
            test_data = dat_test_do_all)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862840
Residual.Deviance  0.2401510
Correlation        0.9422127
AUC                0.9951000
Per.Expl          82.6766351
cvDeviance         0.5078189
cvCorrelation      0.8345401
cvAUC              0.9599600
cvPer.Expl        63.3683366
[1] "Relative influence of predictor variables"

                    rel.inf
dist_coast       52.4405492
lat               7.1991001
o2_mean_250m_ann  5.9536531
o2_mean_0m_ann    5.8598880
bathy_mean        4.8365615
chl_mean          3.9782780
temp_mean         3.5566644
o2_mean_60m_ann   3.4031130
sal_mean          2.8806083
ssh_mean          2.2956462
vostr_mean        1.4221959
mld_mean          1.2744067
uo_mean           1.2559427
bathy_sd          1.0945539
uostr_mean        0.9547095
vo_mean           0.9419010
pred_var          0.6522284
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index       var1.names var2.index var2.names int.size
1          11       bathy_mean          3  temp_mean   515.53
2          16  o2_mean_60m_ann         11 bathy_mean   370.15
3          11       bathy_mean          2   chl_mean   340.14
4           6       uostr_mean          1        lat   262.36
5          11       bathy_mean          5    uo_mean   188.30
6          15   o2_mean_0m_ann          1        lat   171.06
7          16  o2_mean_60m_ann          3  temp_mean   166.92
8          11       bathy_mean          9   ssh_mean   157.85
9           8       vostr_mean          5    uo_mean   143.28
10         16  o2_mean_60m_ann          1        lat   124.98
11         17 o2_mean_250m_ann          1        lat   118.68
12         13       dist_coast         10   mld_mean   103.23
13          3        temp_mean          1        lat   101.93
14         11       bathy_mean          4   sal_mean    93.56
[1] "Calculated percent deviance"
[1] 0.3027324

[1] "TPR"
[1] 0.7439359
[1] "TSS"
[1] 0.9025282
[1] "Percent deviance calculated by in house functions"
[1] 82.67664
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7700 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2024333 0.9163195 0.9879827 0.9982252         0.7816168 0.8267664
explore_brt(mod_file_path = "data/brt/mod_outputs/background/annual/brt_do_0m_60m_250m_dail_seas_ann_Nspat_Ntag.rds",
            test_data = dat_test_do_all)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862840
Residual.Deviance  0.1835771
Correlation        0.9599527
AUC                0.9980000
Per.Expl          86.7576132
cvDeviance         0.4558125
cvCorrelation      0.8568120
cvAUC              0.9668800
cvPer.Expl        67.1198347
[1] "Relative influence of predictor variables"

                     rel.inf
bathy_mean        26.6902874
o2_mean_0m        20.7771528
o2_mean_250m_seas  9.4205590
o2_mean_0m_seas    8.9795725
o2_mean_60m_seas   5.5714036
o2_mean_250m_ann   3.9953438
o2_mean_0m_ann     3.5155437
chl_mean           2.7120000
temp_mean          2.4568315
o2_mean_60m_ann    2.2287795
ssh_mean           2.1615954
sal_mean           1.9774404
o2_mean_60m        1.7098348
o2_mean_250m       1.5994560
bathy_sd           1.0616884
mld_mean           1.0452667
vostr_mean         1.0366101
uo_mean            0.9321002
uostr_mean         0.8335454
vo_mean            0.7263540
pred_var           0.5686348
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index        var1.names var2.index       var2.names int.size
1          18 o2_mean_250m_seas         11       bathy_mean   337.89
2           3         temp_mean          1       o2_mean_0m   274.53
3          13       o2_mean_60m         11       bathy_mean   237.43
4          20   o2_mean_60m_ann         11       bathy_mean   208.42
5          18 o2_mean_250m_seas         14     o2_mean_250m   204.43
6          11        bathy_mean          3        temp_mean   201.05
7          11        bathy_mean          4         sal_mean   176.86
8          16   o2_mean_0m_seas         12         bathy_sd   168.86
9          11        bathy_mean          2         chl_mean   160.33
10         11        bathy_mean          5          uo_mean   157.91
11         17  o2_mean_60m_seas          4         sal_mean   148.67
12          2          chl_mean          1       o2_mean_0m   118.15
13         19    o2_mean_0m_ann          4         sal_mean   117.62
14         16   o2_mean_0m_seas          9         ssh_mean   116.45
15         19    o2_mean_0m_ann          3        temp_mean   113.55
16         11        bathy_mean          9         ssh_mean   112.36
17         20   o2_mean_60m_ann         17 o2_mean_60m_seas    90.36
18         19    o2_mean_0m_ann         17 o2_mean_60m_seas    82.73
19         17  o2_mean_60m_seas         11       bathy_mean    82.57
20          9          ssh_mean          4         sal_mean    80.14
21         20   o2_mean_60m_ann          3        temp_mean    76.93
22         16   o2_mean_0m_seas          4         sal_mean    69.88
[1] "Calculated percent deviance"
[1] 0.2450293

[1] "TPR"
[1] 0.7464499
[1] "TSS"
[1] 0.928109
[1] "Percent deviance calculated by in house functions"
[1] 86.75761
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8300 iterations were performed.
There were 21 predictors of which 21 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1783497 0.9358277 0.9920179  1.000986         0.8232424 0.8675761
explore_brt(mod_file_path = "data/brt/mod_outputs/background/annual/brt_do_0m_60m_250m_dail_seas_ann_Yspat_Ntag.rds",
            test_data = dat_test_do_all)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862840
Residual.Deviance  0.1743310
Correlation        0.9628367
AUC                0.9983000
Per.Expl          87.4245846
cvDeviance         0.4469638
cvCorrelation      0.8599570
cvAUC              0.9680700
cvPer.Expl        67.7581364
[1] "Relative influence of predictor variables"

                     rel.inf
dist_coast        49.3246938
o2_mean_0m         8.2391043
o2_mean_0m_seas    5.1503728
o2_mean_250m_seas  5.0655830
lat                3.5620562
bathy_mean         3.3679359
o2_mean_60m_seas   3.0915358
o2_mean_250m_ann   2.7603353
chl_mean           2.5636051
temp_mean          2.2305884
sal_mean           1.9585958
o2_mean_60m        1.6960502
o2_mean_250m       1.6332495
o2_mean_60m_ann    1.5901724
ssh_mean           1.4767219
vostr_mean         0.9417095
mld_mean           0.9346010
o2_mean_0m_ann     0.9281548
uo_mean            0.8093395
bathy_sd           0.7440029
uostr_mean         0.7288300
vo_mean            0.6799355
pred_var           0.5228264
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index        var1.names var2.index        var2.names int.size
1          22   o2_mean_60m_ann         12        bathy_mean   511.25
2          12        bathy_mean          3          chl_mean   318.95
3          20 o2_mean_250m_seas         12        bathy_mean   313.81
4          20 o2_mean_250m_seas         16      o2_mean_250m   313.27
5          15       o2_mean_60m         12        bathy_mean   281.36
6          12        bathy_mean          4         temp_mean   246.45
7           4         temp_mean          2        o2_mean_0m   224.25
8          12        bathy_mean          5          sal_mean   156.85
9          14        dist_coast          9        vostr_mean   148.54
10         14        dist_coast         11          mld_mean    98.98
11         19  o2_mean_60m_seas          5          sal_mean    96.21
12          2        o2_mean_0m          1               lat    92.41
13          7        uostr_mean          1               lat    92.40
14         21    o2_mean_0m_ann          1               lat    88.81
15          3          chl_mean          2        o2_mean_0m    88.02
16         23  o2_mean_250m_ann         20 o2_mean_250m_seas    84.94
17         12        bathy_mean         10          ssh_mean    77.14
18         12        bathy_mean          6           uo_mean    76.75
19         15       o2_mean_60m          4         temp_mean    75.81
20          9        vostr_mean          7        uostr_mean    66.87
21         12        bathy_mean          1               lat    64.02
22          9        vostr_mean          2        o2_mean_0m    63.47
23         18   o2_mean_0m_seas         13          bathy_sd    60.77
24         18   o2_mean_0m_seas          1               lat    59.88
25         19  o2_mean_60m_seas         12        bathy_mean    58.55
26         15       o2_mean_60m          6           uo_mean    55.64
[1] "Calculated percent deviance"
[1] 0.237065

[1] "TPR"
[1] 0.7468409
[1] "TSS"
[1] 0.9323182
[1] "Percent deviance calculated by in house functions"
[1] 87.42458
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8450 iterations were performed.
There were 23 predictors of which 23 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1745892 0.9385744 0.9923763 0.9989385         0.8289876 0.8742458

AGI models w/o tag ID and w/ data at seasonal and annual resolutions

explore_brt(mod_file_path = "data/brt/mod_outputs/background/seasonal/brt_agi_0m_60m_250m_seas_Nspat_Ntag.rds",
            test_data = dat_test_agi_all)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862811
Residual.Deviance  0.2321944
Correlation        0.9450501
AUC                0.9958000
Per.Expl          83.2505563
cvDeviance         0.5034843
cvCorrelation      0.8378703
cvAUC              0.9602500
cvPer.Expl        63.6809390
[1] "Relative influence of predictor variables"

                 rel.inf
bathy_mean    28.4568141
temp_mean     20.6998026
AGI_250m_seas 15.6249042
uostr_mean     5.7123417
AGI_0m_seas    5.2282459
sal_mean       4.9112847
AGI_60m_seas   4.1076147
chl_mean       3.4420183
ssh_mean       3.1190152
bathy_sd       1.8936717
vostr_mean     1.7388308
mld_mean       1.6483372
uo_mean        1.4582145
vo_mean        1.3148051
pred_var       0.6440993
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index    var1.names var2.index var2.names int.size
1          10    bathy_mean          3   sal_mean   771.03
2          15 AGI_250m_seas          2  temp_mean   526.27
3          10    bathy_mean          2  temp_mean   384.13
4          13   AGI_0m_seas          6    vo_mean   265.66
5           6       vo_mean          3   sal_mean   224.98
6          14  AGI_60m_seas          2  temp_mean   186.12
7          15 AGI_250m_seas          3   sal_mean   173.26
8          15 AGI_250m_seas         10 bathy_mean   168.54
9          13   AGI_0m_seas          3   sal_mean   157.30
10          2     temp_mean          1   chl_mean   147.38
11         14  AGI_60m_seas         10 bathy_mean   126.45
[1] "Calculated percent deviance"
[1] 0.3053554

[1] "TPR"
[1] 0.7435394
[1] "TSS"
[1] 0.9009719
[1] "Percent deviance calculated by in house functions"
[1] 83.25056
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8500 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
      RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.203208 0.9152552 0.9868737  1.000793         0.7797258 0.8325056
explore_brt(mod_file_path = "data/brt/mod_outputs/background/seasonal/brt_agi_0m_60m_250m_seas_Yspat_Ntag.rds",
            test_data = dat_test_agi_all)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862811
Residual.Deviance  0.2237274
Correlation        0.9471479
AUC                0.9962000
Per.Expl          83.8613221
cvDeviance         0.4860336
cvCorrelation      0.8436767
cvAUC              0.9623400
cvPer.Expl        64.9397517
[1] "Relative influence of predictor variables"

                 rel.inf
dist_coast    52.5815059
lat            6.9593938
AGI_250m_seas  6.5692316
temp_mean      5.2945894
bathy_mean     4.9780355
AGI_0m_seas    4.5385997
AGI_60m_seas   3.7517941
chl_mean       3.3517654
sal_mean       3.2690448
ssh_mean       1.7173600
mld_mean       1.3612586
vostr_mean     1.2099019
uo_mean        1.1259297
uostr_mean     0.9964419
bathy_sd       0.8685392
vo_mean        0.8486002
pred_var       0.5780083
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index    var1.names var2.index var2.names int.size
1          11    bathy_mean          3  temp_mean   444.33
2          13    dist_coast         10   mld_mean   334.59
3          17 AGI_250m_seas         11 bathy_mean   268.59
4          15   AGI_0m_seas          4   sal_mean   177.64
5          13    dist_coast          8 vostr_mean   164.30
6          15   AGI_0m_seas          7    vo_mean   156.50
7          16  AGI_60m_seas         11 bathy_mean   150.04
8          11    bathy_mean          9   ssh_mean   139.48
9           3     temp_mean          1        lat   135.73
10          4      sal_mean          1        lat   133.75
11         11    bathy_mean          2   chl_mean   131.79
12          6    uostr_mean          1        lat   125.32
13         13    dist_coast          1        lat   106.60
14         15   AGI_0m_seas          2   chl_mean    84.71
[1] "Calculated percent deviance"
[1] 0.2881212

[1] "TPR"
[1] 0.7445508
[1] "TSS"
[1] 0.9054166
[1] "Percent deviance calculated by in house functions"
[1] 83.86132
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7950 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1967264 0.9209808 0.9886739  1.001141          0.792158 0.8386132
explore_brt(mod_file_path = "data/brt/mod_outputs/background/annual/brt_agi_0m_60m_250m_ann_Nspat_Ntag.rds",
            test_data = dat_test_agi_all)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862811
Residual.Deviance  0.2613223
Correlation        0.9358527
AUC                0.9941000
Per.Expl          81.1493964
cvDeviance         0.5379650
cvCorrelation      0.8231527
cvAUC              0.9551000
cvPer.Expl        61.1936563
[1] "Relative influence of predictor variables"

                rel.inf
bathy_mean   29.3779108
temp_mean    20.9455029
AGI_250m_ann 14.6419280
uostr_mean    6.3928196
sal_mean      5.0720853
AGI_60m_ann   4.5467677
chl_mean      4.0243117
ssh_mean      3.2096623
AGI_0m_ann    2.6759034
bathy_sd      2.0730979
vostr_mean    1.9352694
mld_mean      1.6660524
uo_mean       1.5344903
vo_mean       1.2263531
pred_var      0.6778453
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index   var1.names var2.index var2.names int.size
1          10   bathy_mean          3   sal_mean   650.32
2          14  AGI_60m_ann         10 bathy_mean   393.12
3          15 AGI_250m_ann          2  temp_mean   310.21
4          13   AGI_0m_ann          2  temp_mean   299.08
5          10   bathy_mean          2  temp_mean   284.67
6          13   AGI_0m_ann          3   sal_mean   242.75
7           8     ssh_mean          5 uostr_mean   233.15
8           2    temp_mean          1   chl_mean   215.23
9          14  AGI_60m_ann          2  temp_mean   148.02
10         15 AGI_250m_ann         13 AGI_0m_ann   139.98
11          6      vo_mean          3   sal_mean   131.65
[1] "Calculated percent deviance"
[1] 0.3306731

[1] "TPR"
[1] 0.7424086
[1] "TSS"
[1] 0.8853276
[1] "Percent deviance calculated by in house functions"
[1] 81.1494
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8000 iterations were performed.
There were 15 predictors of which 15 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained PseudoR2
1 0.2134505 0.9061452 0.9852186  1.002061         0.7614624 0.811494
explore_brt(mod_file_path = "data/brt/mod_outputs/background/annual/brt_agi_0m_60m_250m_ann_Yspat_Ntag.rds",
            test_data = dat_test_agi_all)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862811
Residual.Deviance  0.2210721
Correlation        0.9490763
AUC                0.9966000
Per.Expl          84.0528649
cvDeviance         0.5036031
cvCorrelation      0.8359368
cvAUC              0.9606200
cvPer.Expl        63.6723657
[1] "Relative influence of predictor variables"

                rel.inf
dist_coast   51.3678090
lat           8.3057941
AGI_60m_ann   6.0633677
bathy_mean    5.4719648
AGI_250m_ann  5.0030563
temp_mean     4.6716351
chl_mean      3.9646719
sal_mean      3.5807263
ssh_mean      2.2066992
AGI_0m_ann    2.0502544
mld_mean      1.3602811
vostr_mean    1.3535883
uo_mean       1.1243457
bathy_sd      1.0313205
uostr_mean    0.9999780
vo_mean       0.8202189
pred_var      0.6242886
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index   var1.names var2.index  var2.names int.size
1          16  AGI_60m_ann         11  bathy_mean   597.37
2          11   bathy_mean          3   temp_mean   358.97
3          17 AGI_250m_ann         16 AGI_60m_ann   194.96
4          15   AGI_0m_ann          4    sal_mean   191.27
5           4     sal_mean          1         lat   180.01
6          13   dist_coast         10    mld_mean   178.93
7           6   uostr_mean          1         lat   169.73
8          11   bathy_mean          2    chl_mean   167.64
9          11   bathy_mean          9    ssh_mean   156.75
10          3    temp_mean          1         lat   146.44
11         13   dist_coast          8  vostr_mean   136.64
12          2     chl_mean          1         lat   134.35
13         11   bathy_mean          1         lat   126.90
14         16  AGI_60m_ann          4    sal_mean   109.84
[1] "Calculated percent deviance"
[1] 0.2918624

[1] "TPR"
[1] 0.7443891
[1] "TSS"
[1] 0.9018302
[1] "Percent deviance calculated by in house functions"
[1] 84.05286
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8650 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1982769 0.9197381 0.9885987  1.001188         0.7894592 0.8405286
explore_brt(mod_file_path = "data/brt/mod_outputs/background/annual/brt_agi_0m_60m_250m_dail_seas_ann_Nspat_Ntag.rds",
            test_data = dat_test_agi_all)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862811
Residual.Deviance  0.1785652
Correlation        0.9621288
AUC                0.9983000
Per.Expl          87.1191229
cvDeviance         0.4548805
cvCorrelation      0.8575571
cvAUC              0.9665200
cvPer.Expl        67.1869914
[1] "Relative influence of predictor variables"

                 rel.inf
bathy_mean    27.2218447
temp_mean     19.1146917
AGI_250m_seas  9.8455447
AGI_0m         6.6126489
uostr_mean     5.3813311
sal_mean       3.7622783
AGI_0m_seas    3.5537907
AGI_250m       3.1561957
AGI_250m_ann   3.0401735
AGI_60m_ann    2.8710090
chl_mean       2.3231281
ssh_mean       2.1705750
AGI_60m_seas   2.1206609
AGI_60m        1.4160122
vostr_mean     1.3316066
bathy_sd       1.3253579
AGI_0m_ann     1.2533802
uo_mean        1.0833600
vo_mean        1.0068085
mld_mean       0.9705674
pred_var       0.4390348
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index    var1.names var2.index  var2.names int.size
1          12        AGI_0m          2   temp_mean  3157.47
2          20   AGI_60m_ann         10  bathy_mean   383.13
3          10    bathy_mean          3    sal_mean   293.95
4          19    AGI_0m_ann         16 AGI_0m_seas   285.37
5          19    AGI_0m_ann         10  bathy_mean   269.38
6          16   AGI_0m_seas          6     vo_mean   264.72
7          20   AGI_60m_ann         16 AGI_0m_seas   263.95
8          16   AGI_0m_seas         13     AGI_60m   211.00
9          12        AGI_0m         10  bathy_mean   195.06
10         18 AGI_250m_seas         10  bathy_mean   182.52
11         18 AGI_250m_seas          2   temp_mean   180.05
12         10    bathy_mean          2   temp_mean   157.93
13          4       uo_mean          2   temp_mean   127.37
14         19    AGI_0m_ann         11    bathy_sd   123.67
15         12        AGI_0m          8    ssh_mean   109.90
16         12        AGI_0m          3    sal_mean   101.87
17         16   AGI_0m_seas          4     uo_mean   101.51
18         18 AGI_250m_seas          3    sal_mean    89.27
19         21  AGI_250m_ann         14    AGI_250m    83.72
20         16   AGI_0m_seas          2   temp_mean    76.37
21         21  AGI_250m_ann          3    sal_mean    75.91
22         12        AGI_0m         11    bathy_sd    74.93
[1] "Calculated percent deviance"
[1] 0.2444309

[1] "TPR"
[1] 0.7462479
[1] "TSS"
[1] 0.927292
[1] "Percent deviance calculated by in house functions"
[1] 87.11912
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8800 iterations were performed.
There were 21 predictors of which 21 had non-zero influence.
       RMSE      Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1772743 0.936553 0.9916911 0.9996831         0.8236749 0.8711912
explore_brt(mod_file_path = "data/brt/mod_outputs/background/annual/brt_agi_0m_60m_250m_dail_seas_ann_Yspat_Ntag.rds",
            test_data = dat_test_agi_all)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862811
Residual.Deviance  0.1715113
Correlation        0.9638951
AUC                0.9985000
Per.Expl          87.6279537
cvDeviance         0.4400447
cvCorrelation      0.8619151
cvAUC              0.9686500
cvPer.Expl        68.2571797
[1] "Relative influence of predictor variables"

                 rel.inf
dist_coast    50.4410262
lat            7.2678587
AGI_0m         5.6576717
AGI_60m_ann    4.4450574
bathy_mean     4.2815646
temp_mean      3.3935652
AGI_0m_seas    2.9909112
AGI_250m_seas  2.6539915
sal_mean       2.4793934
chl_mean       2.3790803
AGI_60m_seas   2.0670030
AGI_250m_ann   1.7341816
AGI_250m       1.6426522
ssh_mean       1.3367289
AGI_60m        1.1879353
AGI_0m_ann     1.1650316
vostr_mean     0.8987196
uo_mean        0.8292838
mld_mean       0.7767813
uostr_mean     0.7072603
vo_mean        0.6524355
bathy_sd       0.6298252
pred_var       0.3820413
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index    var1.names var2.index  var2.names int.size
1          13        AGI_0m          3   temp_mean  1024.13
2          22   AGI_60m_ann         11  bathy_mean   547.36
3          13        AGI_0m          1         lat   469.15
4          18   AGI_0m_seas          7     vo_mean   335.18
5          13        AGI_0m         11  bathy_mean   241.70
6          18   AGI_0m_seas         15     AGI_60m   223.49
7          21    AGI_0m_ann         11  bathy_mean   223.23
8          21    AGI_0m_ann         18 AGI_0m_seas   206.43
9          20 AGI_250m_seas         11  bathy_mean   199.82
10         14    dist_coast          8  vostr_mean   186.60
11         22   AGI_60m_ann         18 AGI_0m_seas   159.23
12          6    uostr_mean          1         lat   137.96
13         14    dist_coast         10    mld_mean   120.60
14         13        AGI_0m          6  uostr_mean    92.94
15         15       AGI_60m         11  bathy_mean    82.93
16         11    bathy_mean          3   temp_mean    80.85
17          4      sal_mean          1         lat    80.17
18         11    bathy_mean          2    chl_mean    74.27
19         20 AGI_250m_seas         16    AGI_250m    62.00
20         11    bathy_mean          5     uo_mean    60.05
21         19  AGI_60m_seas         13      AGI_0m    56.78
22         23  AGI_250m_ann          4    sal_mean    53.45
23         11    bathy_mean          9    ssh_mean    51.53
24         18   AGI_0m_seas          4    sal_mean    49.55
25         14    dist_coast          1         lat    47.46
26          3     temp_mean          1         lat    46.71
[1] "Calculated percent deviance"
[1] 0.2369278

[1] "TPR"
[1] 0.7466861
[1] "TSS"
[1] 0.9304143
[1] "Percent deviance calculated by in house functions"
[1] 87.62795
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8650 iterations were performed.
There were 23 predictors of which 23 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1747118 0.9383493 0.9920527  1.000018         0.8290874 0.8762795

Summary table of results

output_sum_seas_ann <- read.csv(here("data/brt/mod_outputs/brt_background_seas_ann_output_summary.csv"))
kableExtra::kable(output_sum)
model percent_explained calc_deviance TPR_mean TSS AUC RMSE SpearmanCor C.index PredRatio PseudoR2
base_0m_Nspat_Ntag 79.360 0.375 0.740 0.872 0.980 0.228 0.891 0.980 0.998 0.794
base_0m_Nspat_Ytag 91.990 0.182 0.759 0.957 0.994 0.147 0.957 0.994 0.994 0.920
base_0m_Yspat_Ytag 93.500 0.154 0.771 0.965 0.995 0.134 0.964 0.995 0.990 0.935
do_0m_Nspat_Ytag 93.820 0.131 0.769 0.976 0.997 0.120 0.972 0.997 0.998 0.938
do_0m_Yspat_Ytag 94.870 0.114 0.781 0.977 0.998 0.112 0.975 0.998 0.998 0.949
do_0m_60m_Nspat_Ytag 95.007 0.118 0.772 0.978 0.997 0.113 0.975 0.997 0.997 0.950
do_0m_250m_Nspat_Ytag 94.430 0.123 0.777 0.976 0.997 0.117 0.973 0.997 0.997 0.944
do_0m_60m_250m_Nspat_Ytag 95.380 0.113 0.778 0.979 0.997 0.111 0.975 0.997 0.997 0.954
do_0m_60m_250m_Yspat_Ytag 95.590 0.103 0.783 0.978 0.998 0.107 0.977 0.998 0.996 0.956
agi_0m_Nspat_Ytag 93.470 0.143 0.764 0.970 0.996 0.127 0.968 0.996 0.999 0.934
agi_0m_Yspat_Ytag 94.560 0.124 0.776 0.976 0.997 0.117 0.973 0.997 0.996 0.946
agi_0m_60m_Nspat_Ytag 94.111 0.131 0.764 0.971 0.997 0.121 0.971 0.997 0.998 0.941
agi_0m_250m_Nspat_Ytag 93.990 0.137 0.769 0.970 0.996 0.124 0.970 0.996 0.997 0.940
agi_0m_60m_250m_Nspat_Ytag 94.540 0.129 0.768 0.973 0.996 0.118 0.972 0.996 0.998 0.945
agi_0m_60m_250m_Yspat_Ytag 94.780 0.122 0.773 0.974 0.997 0.116 0.973 0.997 0.996 0.948
ggplot(output_sum_seas_ann, aes(AUC, TSS, color = percent_explained, label = model)) + 
  geom_point(size = 3) +
  xlab('AUC') +
  ylab('TSS') +
  scale_color_gradientn(colors = MetBrewer::met.brewer("Greek")) +
  ggrepel::geom_label_repel(aes(label = model),
                  box.padding   = 0.35, 
                  point.padding = 0.5,
                  segment.color = 'grey50')